SSAS - Dwbi Tutorials
SSAS - Dwbi Tutorials
HOME
Introduction
There are three standard storage modes (MOLAP, ROLAP and HOLAP) in OLAP applications which affect the performance of OLAP queries and cube processing, storage requirements and also determine storage locations. SSAS (2005 and 2008) supports not only these three standard storage modes but supports proactive caching, a new feature with SSAS 2005 which enables you to combine the best of both worlds (ROLAP and MOLAP storage) for both frequency of data refresh and OLAP query performance. In the part 1 of this article, I will start my discussion with an overview of each mode and then in part 2 I will cover the new proactive-caching feature of SSAS.
Search
Go Go
CATEGORIES
Databases (19) Oracle (10) Teradata (8) DWBI (71) Analytics (9) SSAS (9) ETL (57)
Reporting (2) SSRS (2) DWH (20) Scheduling (2) Autosys (2) Uncategorized (3) UNIX (31) Shell Scripting (2) Videos (1)
RECENT POSTS
o All the calculations are pre-generated when the cube is processed and stored locally on the OLAP server hence even the complex calculations, as a part the query result, will be pefromed quickly. o MOLAP does not need to have a permanent connection to the underlying relational database (only at the time of processing) as it stores the detail and aggregate data in the OLAP server so the data can be viewed even when there is connection to the relational database. o MOLAP uses compression to store the data on the OLAP server and so has less storage requirements than relational databases for same amount of data. (Note however, that beginning with SQL Server 2008 you can use data compression at relational database level as well).
Storage Modes in SSAS (MOLAP, ROLAP and HOLAP) IBM InfoSphere DataStage Performance Tuning: Overview of Best Practices Looping in Transformer Stage DataStage 8.5 : Example 2 Looping in Transformer Stage
Cons o With MOLAP mode, you need frequent processing to pull refreshed data after last processing resulting in drain on system resources. o Latency; just after the processing if there is any changes in the relational database it will not be reflected on the OLAP server unless re-processing is performed. o MOLAP stores a copy of the relational data at OLAP server and so requires additional investment for storage. o If the data volume is high, the cube processing can take longer, though you can use incremental processing to overcome this.
Generic JobUsing Schema Files In Infosphere DataStage 8.1 Teradata SQL Interview Questions 3 DataStage Parallel Routines - 1 Handling Datasets in DataStage Looping in Transformer Stage DataStage 8.5 : Example 1 SQL Server Analysis Services (SSAS) Step By Step : Part 2 IBM InfoSphere DataStage Performance Tuning: Overview of Best Practices SQL Server Analysis Services
o A permanent connection to the underlying database must be maintained to view the cube data. Note: If you use ROLAP storage mode and your relational database is SQL Server, the Analysis Services server may create indexed views for aggregation. However this requires a few prerequisite to be available for example, the data source must be a table, not a view. The table name must use two part naming convention or it must be qualified with owner/schema name etc. For a complete list of these prerequisites you can refer to the link provided in reference section.
BLOG STATS
16,346 hits
This mode is a hybrid of MOLAP and ROLAP and attempts to provide the greater data capacity of ROLAP and the fast processing and high query performance of MOLAP. In HOLAP storage mode, the cube detail data remains in the underlying relational data store and the aggregations are stored on the OLAP server. If you query only summary data in aggregation, the HOLAP storage mode will work similar to MOLAP. For the detail data queries, HOLAP will drill through the detail data in underlying relational data store and hence performance would not be as good as MOLAP. Therefore, your query would be as fast as MOLAP in if your query result can be provided from query cache or aggregation but performance would degrade if it needs the detail data from relational data store. Pros o HOLAP balances the disk space requirement, as it only stores the aggregate data on the OLAP server and the detail data remains in the relational database. So no duplicate copy of the detail data is maintained. o Since HOLAP does not store detail data on the OLAP server, the cube and partitions would be smaller in size than MOLAP cubes and partitions. o Performance is better than ROLAP as in HOLAP the summary data are stored on the OLAP server and queries can be satisfied from this summary data. o HOLAP would be optimal in the scenario where query response is required and query results are based on aggregations on large volumes of data. Cons o Query performance (response time) degrades if it has to drill through the detail data from relational data store, in this case HOLAP performs very much like ROLAP.
Conclusion
In the part 1 of this article, I discussed the basic OLAP storage modes as well as their pros and cons and then finally I showed how you can configure the storage mode of OLAP objects in SSAS. In part 2 of this article I will cover proactive-caching feature of SSAS, which allows the administrator to better control the frequency of cube data refresh, so that cube can refer to near real time data and at same time it also provides the query performance of MOLAP storage mode.
Share this: Like this: Twitter Facebook Google +1 Email Print
Like
Be the first to like this.
SSAS
LEAVE A COMMENT
Following is a list of common terms when working with SQL Server Analysis Services. Cube - Cube is a multi dimensional data structure composed of dimensions and measure groups. The intersection of dimension and measure groups contained in a cube returns the dataset. Calculated Measure - Each field in a measure group is known as a base measure. Measures created using MDX expressions with/without base measures are known as calculated measures. Data Source View - Its an insulation layer that inherits the basic schema from the data source with the flexibility to manipulate the schema in this layer without modifying the actual schema in the data source. Dimension - Dimension is an OLAP structure that is basically used to contain attributes related to an entity to categorize data on the row / column axis. A dimension almost never contains measurable numeric data, and if at all it contains, it is used as an attribute. Typical example of dimensions are Geography, Organization, Employee, Time etc. Fact - Fact known as a Measure Group in a cube, is an OLAP structure that is basically used to contain measureable numeric data, for one or more entities. In cube parlance these entities are known as Dimensions. A dimension need not be necessarily associated directly with a fact, but a fact is always associated directly with at least one dimension. Typical example of facts are Sales, Performance, Tax etc. Hierarchy - Hierarchy is collection of nested attributes associated in a parent-child fashion with a defined cardinality. Dimension is formed of attributes, and hierarchy contained in a dimension is formed of one or more attributes from the same dimension. KPI - Key Performance Indicators are logical structures defined using MDX expressions. Each KPI has a goal, status, value, trend, and indicator associated with it. Value is derived based on the definition of KPI, all the rest of these values vary based on this derived value. KPIs are the primary elements that makes up a scorecard in a dashboard. MDX - Multi Dimensional Expressions is considered as the query language of multi dimensional data structures. This can be considered as the SQL of OLAP databases, with the major difference that MDX is mostly used for reading data only. Named Set - Named Set is a pre-defined MDX query defined in the script of the cube. It can be thought of synonymous to Views in a SQL Server database. Named sets can be dynamic or static and this nature defines the time when this query gets evaluated. OLAP - Online Analytical Processing is a term used to represent analytical data sources and analysis systems. The fundamental perception and expectation associated with the term OLAP is that it would contain multi dimensional data and the environment hosting the same. Snowflake Schema - Snowflake schema is an OLAP schema, where one or more normalized dimension tables are associated with a fact table. For example, Product Sub Category -> Product Category -> Product can be three normalized dimension tables and Product table can be associated with a fact table like Sales. This is a very common example of a snowflake schema. Star Schema - Star schema is an OLAP schema, where all dimension tables are directly associated with fact tables, and no normalized dimension tables are considered in the schema. For example, Time, Product, Geography dimension tables would be directly associated with a fact table like Sales. This is a very common example of star schema.
Share this: Like this: Twitter Facebook Google +1 Email Print
Like
Be the first to like this.
ANALYTICS
SSAS
DIMENSION FACT
SSAS BASICS
SSAS DEFINITIONS
SSAS TUTORIALS
We have named this new calculated measure TotalSales. The Parent hierarchy specifies which parent hierarchy the measure will be part and in this case it will be Measures. Its a built-in hierarchy and all
measures normally fall under this. In the Expression, we can specify any MDX expression. Here we are adding Internet Sales Amount from FactInternetSales and Reseller Sales Amount from FactResellerSales measure groups. You do not need to type the values you can just drag and drop values from the panes on the left-hand side of the window. In the additional properties you can set additional options for this measure. Save your solution, in the next section we will create named sets and then deploy these at the same time.
Here we are creating two named sets, Internet Sales Top 25 and Reseller Sales Top 25. In these named sets, we are returning the Top 25 products based on Internet Sales and Reseller Sales. In this formula, TopCount, the MDX function returns top 25 records from the dataset. In the Type selection, we can select whether we want the named set to be static or dynamic. We have
selected Dynamic as we want to create a dynamic named set. In the Display folder selection, we can specify where the named sets will appear. By default named sets appear in the last dimension that is used in the formula. Here we have used an attribute hierarchy from Product dimension, so the named sets should appear in the same dimension under Named Sets directory. Save and deploy the solution, and then re-connect to the cube in the Browser pane. You should be able to see the calculated measure and named sets as shown in the below screenshot.
In the next step specify the SSAS server name and logon credentials. If you have everything on the local machine, you can also use localhost as the server name.
If you were able to successfully connect to the specified SSAS instance with the logon credentials specified, in the next step you should be able to select the SSAS Sales database and find the Sales Cube. Select the Sales Cube and proceed to the next step.
In the next step, specify the name of the connection file to save. This file will be saved as an .ODC file and you can reuse this connection file when you want to use the same connection in other workbooks.
After saving the file, you will be prompted with the option to select the kind of report you want to create. We will go with the default option and select PivotTable Report.
After selecting PivotTable Report, a designer will open with options to select dimension, attributes and measures to populate your pivot table. Select the values as shown in the below screenshot. Our intention is to display the hierarchy we created in the Sales Territory dimension on the columns axis, Internet Sales Top 25 named set on the rows axis, and the Total Sales calculated measure in the values area.
After making the above selections, your report should look like the below screenshot. Using the features available from the Options tab, you can format this report and give it a more professional look. You can try drilling down the hierarchy, but you will see that you need to develop the hierarchies. Users who frequently want to see sales of products to top customers, can pick up any named-set that we defined earlier. Instead of having users define formulas for adding internet sales and reseller sales, users can just select Total Sales.
Google +1
Like
Be the first to like this.
ANALYTICS
SSAS
OLAP CUBE DESIGN ONLINE ANALYTICAL PROCESSING SSAS STEP BY STEP GUIDE SSAS TUTORIALS
We might also face errors during deployment, and we will attempt debugging and resolving these errors.
Right-click the solution and select Deploy, this will start deploying the solution. If you have not specified an appropriate account in the impersonation information, your deployment might fail as the account might not have sufficient privileges. If you have followed all the previous steps as explained, you should face errors as shown below. From the error message you can make out that cube processing failed due to the Date dimension. Right-click the Cube Dim Date dimension and select Process, and you would find the following error.
If you recall we have defined a hierarchy in the Date dimension, Year -> Semester -> Quarter -> Month, and the attribute relation expected is one to many. If you browse the data, you will find that the same set of semester values exist in each year, so how do you make them unique for each Quarter? When the Quarter is processed, it will find duplicate Semester as the key columns for the Semester is Semester itself by default which is not unique. So we need to make each attribute unique by changing its key columns.
Edit the Date dimension in the dimension editor, select the Semester attribute and edit the Key Columns property. This should bring up a pop-up window as shown below. To make the Semester attribute unique, we need to make the key column a composite key Year + Semester to make it unique. So select key columns as shown below.
When you select multiple columns in the key column, the name column property becomes blank and its a mandatory property. So select this property and set it again to Semester as we want to display semesters when this is browsed.
This should solve the error we were facing on the date dimension. Duplicate keys are one of the most common errors during dimension processing and we just learned how to resolve this issue.
Deploy the cube and the cube should be deployed successfully. Go to the Browser pane after successful deployment, and try to connect to the cube and browse data by dragging and dropping dimension attributes and measures on the browsing area. Below is an example.
Google +1
Like
Be the first to like this.
ANALYTICS
SSAS
OLAP CUBE DESIGN ONLINE ANALYTICAL PROCESSING SSAS STEP BY STEP GUIDE SSAS TUTORIALS
Designing a Cube
Using BIDS, after the DSV is developed, the next step is to create dimensions. Dimensions are of two types: Database Dimensions and Cube Dimensions. Database dimensions can be perceived as a master template, and Cube dimensions can be perceived as instances / children of this master template. We will start our development with the creation of database dimensions. If you consider a dimension as a table, all the fields in this table can be perceived as attributes. Hierarchy in a dimension is a group of attributes logically related to each other with a defined cardinality. Finally we will create a cube using the dimensions we just developed, and fact tables to create dimensions (cube dimensions) and measure groups
Creating a Dimension
Dimensions are of two types: database dimension and cube dimension. The dimensions that are defined at the solution level can be termed as a database dimension and the ones defined inside the cube are termed as a cube dimension. Dimension Wizard is the primary means of creating a dimension. We will create a dimension using the three dimension tables which we have included in our schema. Right-click the Dimensions folder and select New Dimension, this will invoke the Dimension Wizard. The first screen should look like the below screenshot. You have the options of using an existing table, creating a table in the data source and using a template. We already have the dimension table in our schema and we will use this, so select Use an existing table and click Next.
Select the DSV we created earlier in the DSV selection. We intend to create a dimension from the DimSalesTerritory table, so select the same table. Every dimension table needs to have a key attribute, and in this table SaleTerritoryKey is the primary key column which is guaranteed to identify each record uniquely. It would not make sense to browse this attribute using the Key, instead SalesTerritoryRegion field has unique values. We can also use this field as the key as well as name column. But for the purpose of our exercise, we will use the SaleTerritoryKey field as the key column and SalesTerritoryRegion as the name column. Though it looks inappropriate to use the key field, but when you are starting to develop an understanding of dimensions, this will help to set a rule in your mind that the key field is always required, mostly a surrogate key and you can set a name column to any field to facilitate a convenient browsing mechanism.
In the next screen, you need to make a selection of the attributes that will be present in the dimension. If you uncheck the Enable Browsing button, they wont be visible to client applications when they browse the dimension. Attributes can be of different types and you can specify the type in the Attribute Type field. The Dimension Wizard removes the Name column you set from the key column as that is available due to the key column. So you wont find that field in this list of available attributes.
Now the next step is to give a name to the dimension, name it Cube Dim Sales Territory or anything appropriate. After this step you have completed creating your first dimension.
In a similar manner create Product and Date dimension using the Dimension Wizard.
Creating a Hierarchy
A Hierarchy is a set of logically related attributes with a fixed cardinality. While browsing the data, a hierarchy exposes the top level attribute which can be broken down into lower level attributes. For example, Year -> Semester Quarter Month is a hierarchy. While analyzing the data, it might be required to drill down from a higher level to a detail level, and exposing data as a hierarchy is one of the best solutions for this. Creating a hierarchy is as easy as dragging and dropping attributes in the hierarchy pane of the dimension editor. We want to create a hierarchy in the Sales Territory dimension. Open Sales Territory dimension in the dimension editor, drag and drop attributes in the hierarchy pane, click on each of them and rename them to something appropriate. After completing this, your hierarchy should look similar to the below screenshot.
You will find a warning icon on the hierarchy pane, which says that attribute relationships are missing between these attributes. Country has a one-to-many relationship with Region, and Group has a one-tomany relationship with Country. But these relationships need to be defined explicitly in the dimension. Click on Attribute Relationships tab, right-click the region attribute and select New Attribute Relationship. Set the values as shown in the below screenshot to correct the relationships between these attributes.
After you have applied the above changes, your attribute relationship tab should look like the below screenshot.
If you have observer carefully, relationship types are of two types: Rigid and Flexible. This has an effect on the processing of the cube. Rigid means that you do not expect the relationship to change and Flexible means that relationship values can change. In our dataset, Group is a logical way to categorize countries and it can change, while regions within country have limited or no change. So the relationship type between country and group should be flexible and relationship type between region (sales territory key) and country should be rigid. Double click on the arrow joining Key attribute and Country, and change the relationship type as shown below.
Check out the Hierarchy pane, and you should find that the warning icon is no longer visible. You can change the name of the hierarchy to something appropriate. In the interest of beginners who might get confused with the distinction between attributes and hierarchy, we will keep the name as Hierarchy. Edit the Date dimension, and create a Year Semester Quarter Month hierarchy in the date dimension.
In the next screen, we need to select the tables which will be used to create measure groups. We already
have a DSV which has fact tables in the schema. So we will use this as shown in the below screenshot.
In the next screen, we need to select the measures that we want to create from the fact tables we just selected in the previous screen. For now, select all the fields as shown below and move to the next screen.
In this screen you need to select any existing dimensions. We have created three dimensions and we will include all of these dimensions as shown below.
In the next screen, we can select if we want to create any additional new dimensions from the tables available in the DSV. We do not want to create any more dimensions, so unselect any selected tables as shown below and move to the next screen.
Finally you need to name your cube, which is the last step of the wizard before your cube is created. Name it something appropriate like Sales Cube as shown below.
Now your cube should have been created and if your cube editor is open you should find different tabs to configure and design various features and aspects of the cube. If you look carefully in the below screenshot, you will find FactInternetSales and FactResellerSales measure groups. Also you will find Sales Territory and Product dimension, but Date dimension is missing. Both fact tables have multiple fields referencing the DateKey from the Date dimension. BIDS intelligently creates three dimensions from the Date dimension and names them to the name of the field which is referenced from the Date dimension. So you will find three compounds of Date dimension Ship Date, Due Date and Order Date dimensions. These are known as roleplaying dimensions.
Google +1
Like
Be the first to like this.
ANALYTICS
SSAS
OLAP CUBE DESIGN ONLINE ANALYTICAL PROCESSING SSAS STEP BY STEP GUIDE SSAS TUTORIALS
Overview
SQL Server Analysis Services (SSAS) is the technology from the Microsoft Business Intelligence stack, to develop Online Analytical Processing (OLAP) solutions. In simple terms, you can use SSAS to create cubes using data from data marts / data warehouse for deeper and faster data analysis. Cubes are multi-dimensional data sources which have dimensions and facts (also known as measures) as its basic constituents. From a relational perspective dimensions can be thought of as master tables and facts can be thought of as measureable details. These details are generally stored in a pre-aggregated proprietary format and users can analyze huge amounts of data and slice this data by dimensions very easily. Multidimensional expression (MDX) is the query language used to query a cube, similar to the way T-SQL is used to query a table in SQL Server. Simple examples of dimensions can be product / geography / time / customer, and similar simple examples of facts can be orders / sales. A typical analysis could be to analyze sales in Asia-pacific geography during the past 5 years. You can think of this data as a pivot table where geography is the column-axis and years is the row axis, and sales can be seen as the values. Geography can also have its own hierarchy like Country->City->State. Time can also have its own hierarchy like Year->Semester->Quarter. Sales could then be analyzed using any of these hierarchies for effective data analysis. A typical higher level cube development process using SSAS involves the following steps: 1) Reading data from a dimensional model 2) Configuring a schema in BIDS (Business Intelligence Development Studio) 3) Creating dimensions, measures and cubes from this schema 4) Fine tuning the cube as per the requirements 5) Deploying the cube In this tutorial we will step through a number of topics that you need to understand in order to successfully create a basic cube. Our high level outline is as follows: Design and develop a star-schema Create dimensions, hierarchies, and cubes Process and deploy a cube Develop calculated measures and named sets using MDX Browse the cube data using Excel as the client tool When you start learning SSAS, you should have a reasonable relational database background. But when
you start working in a multi-dimensional environment, you need to stop thinking from a two-dimensional (relational database) perspective, which will develop over time. In this tutorial, we will also try to develop an understanding of OLAP development from the eyes of an OLTP practitioner.
AdventureWorks Data Warehouse 2008R2 is the database we need for our exercises. Point the installer to the SQL Server instance that you are using, and install the database. After the database in installed, open SQL Server Management Studio to verify the databases that were installed. You should find something similar to the below screenshot.
Expand the database higlighted above and check out the different Dim and Fact tables in this database. The tables having the prefix Dim are suited to be used as Dimension tables, and tables having prefix Fact are suited to be used as Fact tables.
After this, you need to specify the impersonation information for the data source. This information is used to specify how the solution will connect to the SSAS instance using the credentials specified. Every time you deploy or process the solution, this connection information will be used. So keep in mind that the account you use should have sufficient privileges. If you are not sure which account to use, it is suggested that you use an account with administrator privileges on your development machine. Please keep in mind that this is not recommended and should not be done in production environments. This is just suggested to quickly get you started with cube design and development.
After specifying this information, click Next. This should take you to the final screen where you need to name the data source. Name it something appropriate and click OK, which should create your data source.
A data warehouse or data mart from where we would source our data could contain ten to hundreds of tables. Also one would not have the liberty to change the schema of these tables to suit the requirements of the cube design. The Data Source View is an insulation layer between the actual data source and the solution. We can create and modify the schema we need in this layer and this is used as the data source for the different objects we create in the solution. A Star Schema is a schema structure where different dimension tables are directly connected to the fact table. If you imagine a fact table in the center and different dimensions attached to it, you would find the figure similar to a star and hence the name star schema. Its the simplest form of the schema and hence we will use this in our exercise. Right-click on the Data Source View and select New Data Source View and a wizard should pop-up with a Welcome screen. Select Next, and the next screen should prompt you to select a relational data source. Select the data source we just created and click Next, the next screen should prompt you to select tables that we intend to use in our solution. Select the tables as shown in the below screenshot. The below fact and dimension tables are chosen as they are interlinked with each other and also suits the requirements of the exercises to follow.
Select Next, name the DSV to something appropriate and this should finally create your Data Source View. After arranging the tables in the DSV, your schema should look similar to the below screenshot.
In the above figure, you can see that both the fact tables are related to all three dimensions in the same manner. This is a typical case of a star schema. You can also browse the data, create calculated fields, assign primary keys and carry out other similar function in this designer to modify the schema without modifying the actual schema in the database.
Share this: Like this: Twitter Facebook Google +1 Email Print
Like
Be the first to like this.
ANALYTICS
SSAS
OLAP CUBE DESIGN ONLINE ANALYTICAL PROCESSING SSAS STEP BY STEP GUIDE SSAS TUTORIALS
What is the difference between SSAS 2005 and SSAS2008? 1. In 2005 its not possible to create an empty cube but in 2008 we can create an empty cube. 2. A new feature in Analysis Services 2008 is the Attribute Relationships tab in the Dimension Designer . to implement attribute relationship is complex in ssas 2005 3. we can create ONLY 2000 partitions per Measure Group in ssas 2005 and the same limit of partitions is removed in ssas 2008. You can answer more but if you end this with these then the interviewer feel that you are REAL EXPERIENCED.
What is datawarehouse in short DWH? The datawarehouse is an informational environment that Provides an integrated and total view of the enterprise Makes the enterprises current and historical information easily available for decision making Makes decision-support transactions possible without hindering operational systems Renders the organizations information consistent Presents a flexible and interactive source of strategic information OR a warehouse is a Subject oriented Integrated Time variant Non volatile for doing decision support OR Collection of data in support of managements decision making process. He defined the terms in the sentence as follows. OR Subject oriented: It define the specific business domain ex: banking, retail, insurance, etc.. Integrated: It should be in a position to integrated data from various source systems Ex: sql,oracle,db2 etc Time variant: It should be in a position to maintain the data the various time periods. Non volatile: Once data is inserted it cant be changed What is data mart? A data mart is a subset of an organizational data store, usually oriented to a specific purpose or major data subject that may be distributed to support business needs. Data marts are analytical data stores designed to focus on specific business functions for a specific community within an organization. Data marts are often derived from subsets of data in a data warehouse, though in the bottom-up data warehouse design methodology the data warehouse is created from the union of organizational data marts. They are 3 types of data mart they are 1. Dependent 2. Independent 3. Logical data mart What are the difference between data mart and data warehouse? Datawarehouse is complete data where as Data mart is Subset of the same.
Ex: All the organisation data may related to finance department, HR, banking dept are stored in data warehouse where as in data mart only finance data or HR department data will be stored. So data warehouse is a collection of different data marts. Have you ever worked on performance tuning, if yes what are the steps involved in it? We need to identify the bottlenecks to tune the performance, to overcome the bottleneck we need to following the following. 1. Avoid named queries 2. Unnecessary relationships between tables 3. Proper attribute relationships to be given 4. Proper aggregation design 5. Proper partitioning of data 6. Proper dimension usage design 7. Avoid unnecessary many to many relationships 8. Avoid unnecessary measures 9. Set AttributeHierarchyEnabled = FALSE to Attributes that is not required 10. Wont take even single measure which is not necessary. What are the difficulties faced in cube development? This question is either to test whether you are really experienced or when he doesnot have any questions to ask .. You can tell any area where you feel difficult to work. But always the best answers will be the following. 1. Giving attribute relationships 2. Calculations 3. Giving dimension usage (many to many relationship) 4. Analyzing the requirements Explain the flow of creating a cube? Steps to create a cube in ssas 1. Create a data source. 2. Create a datasource view. 3. Create Dimensions 4. Create a cube. 5. Deploy and Process the cube. What is a datasource or DS? The data source is the Physical Connection information that analysis service uses to connect to the database that host the data. The data source contains the connection string which specifies the server and the database hosting the data as well as any necessary authentication credentials. What is datasourceview or DSV? A data source view is a persistent set of tables from a data source that supply the data for a particular cube. BIDS also includes a wizard for creating data source views, which you can invoke by right-clicking on the Data Source Views folder in Solution Explorer.
1. Datasource view is the logical view of the data in the data source. 2. Data source view is the only thing a cube can see. What is named calculation? A named calculation is a SQL expression represented as a calculated column. This expression appears and behaves as a column in the table. A named calculation lets you extend the relational schema of existing tables or views in a data source view without modifying the tables or views in the underlying data source. Named calculation is used to create a new column in the DSV using hard coded values or by using existing columns or even with both. What is named query? Named query in DSV is similar to View in Database. This is used to create Virtual table in DSV which will not impact the underlying database. Named query is mainly used to merge the two or more table in the datasource view or to filter columns of a table. Why we need named queries? A named query is used to join multiple tables, to remove unnecessary columns from a table of a database. You can achieve the same in database using Views but this Named Queries will be the best bet whe you dont have access to create Views in database. How will you add a new column to an existing table in data source view? By using named calculations we can add a new column to an existing table in the data source view. Named Calculation is explained above. What is dimension table? A dimension table contains hierarchical data by which youd like to summarize. A dimension table contains specific business information, a dimension table that contains the specific name of each member of the dimension. The name of the dimension member is called an attribute The key attribute in the dimension must contain a unique value for each member of the dimension. This key attribute is called primary key column The primary key column of each dimension table corresponding to the one of the key column in any related fact table. What is fact table? A fact table contains the basic information that you wish to summarize. The table that stores the detailed value for measure is called fact table. In simple and best we can define as The table which contains METRICS that are used to analyse the business. It consists of 2 sections 1) Foregine key to the dimesion 2) measures/facts(a numerical value that used to monitor business activity) What is Factless fact table? This is very important interview question. The Factless Fact Table is a table which is similar to Fact Table except for having any measure; I mean that this table just has the links to the dimensions. These tables enable you to track events; indeed they are for recording events.
Factless fact tables are used for tracking a process or collecting stats. They are called so because, the fact table does not have aggregatable numeric values or information. They are mere key values with reference to the dimensions from which the stats can be collected What is attribute relationships, why we need it? Attribute relationships are the way of telling the analysis service engine that how the attributes are related with each other. It will help to relate two or more attributes to each other.Processing time will be decreased if proper relationships are given. This increases the Cube Processing performance and MDX query performance too. In Microsoft SQL Server Analysis Services, attributes within a dimension are always related either directly or indirectly to the key attribute. When you define a dimension based on a star schema, which is where all dimension attributes are derived from the same relational table, an attribute relationship is automatically defined between the key attribute and each non-key attribute of the dimension. When you define a dimension based on a snowflake schema, which is where dimension attributes are derived from multiple related tables, an attribute relationship is automatically defined as follows: Between the key attribute and each non-key attribute bound to columns in the main dimension table. Between the key attribute and the attribute bound to the foreign key in the secondary table that links the underlying dimension tables. Between the attribute bound to foreign key in the secondary table and each non-key attribute bound to columns from the secondary table. How many types of attribute relationships are there? They are 2 types of attribute relationships they are 1. Rigid 2. Flexible Rigid: In Rigid relationships where the relationship between the attributes is fixed, attributes will not change levels or their respective attribute relationships. Example: The time dimension. We know that month January 2009 will ONLY belong to Year 2009 and it wont be moved to any other year. Flexible : In Flexible relationship between the attributes is changed. Example: An employee and department. An employee can be in accounts department today but it is possible that the employee will be in Marketing department tomorrow. How many types of dimensions are there and what are they? They are 3 types of dimensions: 1. confirm dimension 2. junk dimension 3. degenerate attribute What are confirmed dimensions, junk dimension and degenerated dimensions? Confirm dimension: It is the dimension which is sharable across the multiple facts or data model. This is also called as Role Playing Dimensions. junk dimension: A number of very small dimensions might be lumped (a small irregularly shaped) together to
form a single dimension, a junk dimension the attributes are not closely related. Grouping of Random flags and text Attributes in a dimension and moving them to a separate sub dimension is known as junk dimension. Degenerated dimension: In this degenerate dimension contains their values in fact table and the dimension id not available in dimension table. Degenerated Dimension is a dimension key without corresponding dimension. Example: In the PointOfSale Transaction Fact table, we have: Date Key (FK), Product Key (FK), Store Key (FK), Promotion Key (FP), and POS Transaction Number Date Dimension corresponds to Date Key, Production Dimension corresponds to Production Key. In a traditional parent-child database, POS Transactional Number would be the key to the transaction header record that contains all the info valid for the transaction as a whole, such as the transaction date and store identifier. But in this dimensional model, we have already extracted this info into other dimension. Therefore, POS Transaction Number looks like a dimension key in the fact table but does not have the corresponding dimension table. What are the types of database schema? They are 3 types of database schema they are 1. Star 2. Snowflake 3. Starflake What is star, snowflake and star flake schema? Star schema: In star schema fact table will be directly linked with all dimension tables. The star schemas dimensions are denormalized with each dimension being represented by a single table. In a star schema a central fact table connects a number of individual dimension tables. Snowflake: The snowflake schema is an extension of the star schema, where each point of the star explodes into more points. In a star schema, each dimension is represented by a single dimensional table, whereas in a snowflake schema, that dimensional table is normalized into multiple lookup tables, each representing a level in the dimensional hierarchy. In snow flake schema fact table will be linked directly as well as there will be some intermediate dimension tables between fact and dimension tables. Star flake: A hybrid structure that contains a mixture of star(denormalized) and snowflake(normalized) schemas. How will you hide an attribute? We can hide the attribute by selecting AttributeHierarchyVisible = False in properties of the attribute. How will you make an attribute not process? By selecting AttributeHierarchyEnabled = False, we can make an attribute not in process. What is use of IsAggregatable property? In Analysis Service we generally see all dimension has All member. This is because of IsAggregatable property of the attribute. You can set its value to false, so that it will not show All member. Its default member for that attribute. If you hide this member than you will have to set other attribute value to default member else it will pick some value as default and this will create confusion in browsing data if someone is not known to change in default member.
What are key, name and value columns of an attribute? Key column of any attribute: Contains the column or columns that represent the key for the attribute, which is the column in the underlying relational table in the data source view to which the attribute is bound. The value of this column for each member is displayed to users unless a value is specified for the NameColumn property. Name column of an attribute: Identifies the column that provides the name of the attribute that is displayed to users, instead of the value in the key column for the attribute. This column is used when the key column value for an attribute member is cryptic or not otherwise useful to the user, or when the key column is based on a composite key. The NameColumn property is not used in parent-child hierarchies; instead, the NameColumn property for child members is used as the member names in a parent-child hierarchy. Value columns of an attribute: Identifies the column that provides the value of the attribute. If the NameColumn element of the attribute is specified, the same DataItem values are used as default values for the ValueColumn element. If the NameColumn element of the attribute is not specified and the KeyColumns collection of the attribute contains a single KeyColumn element representing a key column with a string data type, the same DataItem values are used as default values for the ValueColumn element. What is hierarchy, what are its types and difference between them? A hierarchy is a very important part of any OLAP engine and allows users to drill down from summary levels hierarchies represent the way user expect to explore data at more detailed level hierarchies is made up of multipule levels creating the structure based on end user requirements. ->years->quarter->month->week ,are all the levels of calender hierarchy They are 2 types of hierarchies they are 1. Natural hierarchy 2. Unnatural hierarchy Natural hierarchy: This means that the attributes are intuitively related to one another. There is a clear relationship from the top of the hierarchy to the bottom. Example: An example of this would be date: year, quarter and month follow from each other, and in part, define each other. Unnatural hierarchy: This means that the attributes are not clearly related. Example: An example of this might be geography; we may have country -> state -> city, but it is not clear where Province might sit. What is Attribute hierarchy? An attribute hierarchy is created for every attribute in a dimension, and each hierarchy is available for dimensioning fact data. This hierarchy consists of an All level and a detail level containing all members of the hierarchy. you can organize attributes into user-defined hierarchies to provide navigation paths in a cube. Under certain circumstances, you may want to disable or hide some attributes and their hierarchies. What is use of AttributeHierarchyDisplayFolder property ? AttributeHierarchyDisplayFolder: Identifies the folder in which to display the associated attribute hierarchy to end users. For example if I set the property value as Test to all the Attributes of a dimension then a folder with the name Test will be created and all the Attributes will be placed into the same.
What is use of AttributeHierarchyEnabled? AttributeHierarchyEnabled: Determines whether an attribute hierarchy is generated by Analysis Services for the attribute. If the attribute hierarchy is not enabled, the attribute cannot be used in a userdefined hierarchy and the attribute hierarchy cannot be referenced in Multidimensional Expressions (MDX) statements. What is use of AttributeHierarchyOptimizedState? AttributeHierarchyOptimizedState: Determines the level of optimization applied to the attribute hierarchy. By default, an attribute hierarchy is FullyOptimized, which means that Analysis Services builds indexes for the attribute hierarchy to improve query performance. The other option, NotOptimized, means that no indexes are built for the attribute hierarchy. Using NotOptimized is useful if the attribute hierarchy is used for purposes other than querying, because no additional indexes are built for the attribute. Other uses for an attribute hierarchy can be helping to order another attribute. What is use of AttributeHierarchyOrdered ? AttributeHierarchyOrdered: Determines whether the associated attribute hierarchy is ordered. The default value is True. However, if an attribute hierarchy will not be used for querying, you can save processing time by changing the value of this property to False. What is the use of AttributeHierarchyVisible ? AttributeHierarchyVisible : Determines whether the attribute hierarchy is visible to client applications. The default value is True. However, if an attribute hierarchy will not be used for querying, you can save processing time by changing the value of this property to False. What are types of storage modes? There are three standard storage modes in OLAP applications 1. MOLAP 2. ROLAP 3. HOLAP Compare the Three Storage Modes ? Summary and comparison Basic Storage Mode MOLAP Multidimensional Format Storage Location for Detail Data Storage Location for Summary/ Aggregations Multidimensional Format MediumBecause detail data is stored in compressed format. HOLAP Relational Database ROLAP Relational Database Multidimensional Format Relational Database Large Slow Slow Low Small Medium Fast Medium Storage space requirement Query Response Time Fast Fast High Processing Time Latency
What is MOLAP and its advantage? MOLAP (Multi dimensional Online Analytical Processing) : MOLAP is the most used storage type. Its designed to offer maximum query performance to the users. the data and aggregations are stored in a multidimensional format, compressed and optimized for performance. This is both good and bad. When a cube with MOLAP storage is processed, the data is pulled from the relational database, the aggregations are performed, and the data is stored in the AS database. The data inside the cube will refresh only when the cube is processed, so latency is high. Advantages: 1. Since the data is stored on the OLAP server in optimized format, queries (even complex calculations) are faster than ROLAP. 2. The data is compressed so it takes up less space. 3. And because the data is stored on the OLAP server, you dont need to keep the connection to the relational database. 4. Cube browsing is fastest using MOLAP. What is ROLAP and its advantage? ROLAP (Relational Online Analytical Processing) : ROLAP does not have the high latency disadvantage of MOLAP. With ROLAP, the data and aggregations are stored in relational format. This means that there will be zero latency between the relational source database and the cube. Disadvantage of this mode is the performance, this type gives the poorest query performance because no objects benefit from multi dimensional storage. Advantages: 1. Since the data is kept in the relational database instead of on the OLAP server, you can view the data in almost real time. 2. Also, since the data is kept in the relational database, it allows for much larger amounts of data, which can mean better scalability. 3. Low latency. What is HOLAP and its advantage? Hybrid Online Analytical Processing (HOLAP): HOLAP is a combination of MOLAP and ROLAP. HOLAP stores the detail data in the relational database but stores the aggregations in multidimensional format. Because of this, the aggregations will need to be processed when changes are occur. With HOLAP you kind of have medium query performance: not as slow as ROLAP, but not as fast as MOLAP. If, however, you were only querying aggregated data or using a cached query, query performance would be similar to MOLAP. But when you need to get that detail data, performance is closer to ROLAP. Advantages: 1. HOLAP is best used when large amounts of aggregations are queried often with little detail data, offering high performance and lower storage requirements. 2. Cubes are smaller than MOLAP since the detail data is kept in the relational database. 3. Processing time is less than MOLAP since only aggregations are stored in multidimensional format. 4. Low latency since processing takes place when changes occur and detail data is kept in the relational database. What are Translations and its use? Translation: The translation feature in analysis service allows you to display caption and attributes names
that correspond to a specific language. It helps in providing GLOBALIZATION to the Cube. What is Database dimension? All the dimensions that are created using NEW DIMENSION Wizard are database dimensions. In other words, the dimensions which are at Database level are called Database Dimensions. What is Cube dimension? A cube dimension is an instance of a database dimension within a cube is called as cube dimension. A database dimension can be used in multiple cubes, and multiple cube dimensions can be based on a single database dimension Difference between Database dimension and Cube dimension? 1. The Database dimension has only Name and ID properties, whereas a Cube dimension has several more properties. 2. Database dimension is created one where as Cube dimension is referenced from database dimension. 3. Database dimension exists only once.where as Cube dimensions can be created more than one using ROLE PLAYING Dimensions concept. How will you add a dimension to cube? To add a dimension to a cube follow these steps. 1. In Solution Explorer, right-click the cube, and then click View Designer. 1. In the Design tab for the cube, click the Dimension Usage tab. 2. Either click the Add Cube Dimension button, or right-click anywhere on the work surface and then click Add Cube Dimension. 3. In the Add Cube Dimension dialog box, use one of the following steps: 4. To add an existing dimension, select the dimension, and then click OK. 5. To create a new dimension to add to the cube, click New dimension, and then follow the steps in the Dimension Wizard. What is SCD (slowly changing dimension)? Slowly changing dimensions (SCD) determine how the historical changes in the dimension tables are handled. Implementing the SCD mechanism enables users to know to which category an item belonged to in any given date. What are types of SCD? It is a concept of STORING Historical Changes and when ever an IT guy finds a new way to store then a new Type will come into picture. Basically there are 3 types of SCD they are given below 1. SCD type1 2. SCD type2 3. SCD type3 What is Type1, Type2, Type3 of SCD? Type 1: In Type 1 Slowly Changing Dimension, the new information simply overwrites the original information. In other words, no history is kept. In our example, recall we originally have the following table:
Name Christina
State Illinois
After Christina moved from Illinois to California, the new information replaces the new record, and we have the following table: Customer Key 1001 Name Christina State California
Advantages: This is the easiest way to handle the Slowly Changing Dimension problem, since there is no need to keep track of the old information. Disadvantages: All history is lost. By applying this methodology, it is not possible to trace back in history. Usage: About 50% of the time. When to use Type 1: Type 1 slowly changing dimension should be used when it is not necessary for the data warehouse to keep track of historical changes. Type 2: In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The new record gets its own primary key. In our example, recall we originally have the following table: Customer Key 1001 Name Christina State Illinois
After Christina moved from Illinois to California, we add the new information as a new row into the table: Customer Key 1001 1005 Name Christina Christina State Illinois California
Advantages: This allows us to accurately keep all historical information. Disadvantages: 1. This will cause the size of the table to grow fast. In cases where the number of rows for the table is very high to start with, storage and performance can become a concern. 2. This necessarily complicates the ETL process. Usage: About 50% of the time. Type3 : In Type 3 Slowly Changing Dimension, there will be two columns to indicate the particular attribute of interest, one indicating the original value, and one indicating the current value. There will also be a column that indicates when the current value becomes active. In our example, recall we originally have the following table: Customer Key 1001 Name Christina State Illinois
To accommodate Type 3 Slowly Changing Dimension, we will now have the following columns: Customer Key,Name,OriginalState,CurrentState,Effective Date After Christina moved from Illinois to California, the original information gets updated, and we have the following table (assuming the effective date of change is January 15, 2003): Customer Key 1001 Name Christina OriginalState Illinois CurrentState California Effective Date 15-JAN-2003
Advantages: 1. This does not increase the size of the table, since new information is updated. 2. This allows us to keep some part of history. Disadvantages: Type 3 will not be able to keep all history where an attribute is changed more than once. For example, if Christina later moves to Texas on December 15, 2003, the California information will be lost. Usage: Type 3 is rarely used in actual practice. What is role playing dimension with two examples? Role play dimensions: We already discussed about this. This is nothing but CONFIRMED Dimensions. A dimension can play different role in a fact table you can recognize a roleplay dimension when there are multiple columns in a fact table that each have foreign keys to the same dimension table. Ex1: There are three dimension keys in the factinternalsales,factresellersales tables which all refer to the dimtime table,the same time dimension is used to track sales by that contain either of these fact table,the corresponding role-playing dimension are automatically added to the cube. Ex2 : In retail banking, for checking account cube we could have transaction date dimension and effective date dimension. Both dimensions have date, month, quarter and year attributes. The formats of attributes are the same on both dimensions, for example the date attribute is in dd-mm-yyyy format. Both dimensions have members from 1993 to 2010. What is measure group, measure? Measure groups : These measure groups can contain different dimensions and be at different granularity but so long as you model your cube correctly, your users will be able to use measures from each of these measure groups in their queries easily and without worrying about the underlying complexity. Creating multiple measure groups : To create a new measure group in the Cube Editor, go to the Cube Structure tab and right-click on the cube name in the Measures pane and select New Measure Group. Youll then need to select the fact table to create the measure group from and then the new measure group will be created; any columns that arent used as foreign key columns in the DSV will automatically be created as measures, and youll also get an extra measure of aggregation type Count. Its a good idea to delete any measures you are not going to use at this stage. Measures : Measures are the numeric values that our users want to aggregate, slice, dice and otherwise analyze, and as a result, its important to make sure they behave the way we want them to. One of the fundamental reasons for using Analysis Services is that, unlike a relational database it allows us to build into our cube design business rules about measures: how they should be formatted, how they should aggregate up, how they interact with specific dimensions and so on. What is attribute? An attribute is a specification that defines a property of an object, element, or file. It may also refer to or set
the specific value for a given instance of such. What is surrogate key? A surrogate key is the SQL generated key which acts like an alternate primary key for the table in database, Data warehouses commonly use a surrogate key to uniquely identify an entity. A surrogate is not generated by the user but by the system. A primary difference between a primary key and surrogate key in few databases is that primarykey uniquely identifies a record while a Surrogatekey uniquely identifies an entity. Ex: An employee may be recruited before the year 2000 while another employee with the same name may be recruited after the year 2000. Here, the primary key will uniquely identify the record while the surrogate key will be generated by the system (say a serial number) since the SK is NOT derived from the data. How many types of relations are there between dimension and measure group? They are six relation between the dimension and measure group, they are 1. No Relationship 2. Regular 3. Refernce 4. Many to Many 5. Data Mining 6. Fact What is regular type, no relation type, fact type, referenced type, many-to-many type with example? No relationship: The dimension and measure group are not related. Regular: The dimension table is joined directly to the fact table. Referenced: The dimension table is joined to an intermediate table, which in turn,is joined to the fact table. Many to many: The dimension table is to an intermediate fact table,the intermediate fact table is joined , in turn, to an intermediate dimension table to which the fact table is joined. Data mining: The target dimension is based on a mining model built from the source dimension. The source dimension must also be included in the cube. Fact table: The dimension table is the fact table. What are calculated members and what is its use? Calculations are item in the cube that are eveluated at runtime Calculated members: You can create customized measures or dimension members, called calculated members, by combining cube data, arithmetic operators, numbers, and/or functions. Example: You can create a calculated member called Marks that converts dollars to marks by multiplying an existing dollar measure by a conversion rate. Marks can then be displayed to end users in a separate row or column. Calculated member definitions are stored, but their values exist only in memory. In the preceding example, values in marks are displayed to end users but are not stored as cube data. What are KPIs and what is its use? In Analysis Services, a KPI is a collection of calculations that are associated with a measure group in a cube that are used to evaluate business success. We use KPI to see the business at the particular point, this is represents with some graphical items such as traffic signals,ganze etc
What are actions, how many types of actions are there, explain with example? Actions are powerful way of extending the value of SSAS cubes for the end user. They can click on a cube or portion of a cube to start an application with the selected item as a parameter, or to retrieve information about the selected item. One of the objects supported by a SQL Server Analysis Services cube is the action. An action is an event that a user can initiate when accessing cube data. The event can take a number of forms. For example, a user might be able to view a Reporting Services report, open a Web page, or drill through to detailed information related to the cube data Analysis Services supports three types of actions.. Report action: Report action Returns a Reporting Services report that is associated with the cube data on which the action is based. Drill through: Drillthrough Returns a result set that provides detailed information related to the cube data on which the action is based. Standard: Standard has five action subtypes that are based on the specified cube data. Dataset: Returns a mutlidimensional dataset. Proprietary: Returns a string that can be interpreted by a client application. Rowset: Returns a tabular rowset. Statement: Returns a command string that can be run by a client application. URL: Returns a URL that can be opened by a client application, usually a browser. What is partition, how will you implement it? You can use the Partition Wizard to define partitions for a measure group in a cube. By default, a single partition is defined for each measure group in a cube. Access and processing performance, however, can degrade for large partitions. By creating multiple partitions, each containing a portion of the data for a measure group, you can improve the access and processing performance for that measure group. What is the minimum and maximum number of partitions required for a measure group? In 2005 a MAX of 2000 partitions can be created per measure group and that limit is lifted in later versions. In any version the MINIMUM is ONE Partition per measure group. What are Aggregations and its use? Aggregations provide performance improvements by allowing Microsoft SQL Server Analysis Services (SSAS) to retrieve pre-calculated totals directly from cube storage instead of having to recalculate data from an underlying data source for each query. To design these aggregations, you can use the Aggregation Design Wizard. This wizard guides you through the following steps: 1. Selecting standard or custom settings for the storage and caching options of a partition, measure group, or cube. 2. Providing estimated or actual counts for objects referenced by the partition, measure group, or cube. 3. Specifying aggregation options and limits to optimize the storage and query performance delivered by designed aggregations. 4. Saving and optionally processing the partition, measure group, or cube to generate the defined aggregations. 5. After you use the Aggregation Design Wizard, you can use the Usage-Based Optimization Wizard to
design aggregations based on the usage patterns of the business users and client applications that query the cube. What is perspective, have you ever created perspective? Perspectives are a way to reduce the complexity of cubes by hidden elements like measure groups, measures, dimensions, hierarchies etc. Its nothing but slicing of a cube, for ex we are having retail and hospital data and end user is subscribed to see only hospital data, then we can create perspective according to it. What is deploy, process and build? Bulid: Verifies the project files and create several local files. Deploy: Deploy the structure of the cube(Skeleton) to the server. Process: Read the data from the source and build the dimesions and cube structures Elaborating the same is given below. Build: Its is a used to process the data of the cube database. Build is a version of a program. As a rule, a build is a pre-release version and as such is identified by a build number, rather than by a release number. Reiterative (repeated) builds are an important part of the development process. Throughout development, application components are collected and repeatedly compiled for testing purposes, to ensure a reliable final product. Build tools, such as make or Ant, enable developers to automate some programming tasks. As a verb, to build can mean either to write code or to put individual coded components of a program together. Deployment: During development of an Analysis Services project in Business Intelligence Development Studio, you frequently deploy the project to a development server in order to create the Analysis Services database defined by the project. This is required to test the project. for example, to browse cells in the cube, browse dimension members, or verify key performance indicators (KPIs) formulas. What is the maximum size of a dimension? The maximum size of the dimension is 4 gb . What are the types of processing and explain each? They are 6 types of processing in ssas ,they are Process Full Process Data Process Index Process Incremental Process Structure UnProcess Process Full: Processes an Analysis Services object and all the objects that it contains. When Process Full is executed against an object that has already been processed, Analysis Services drops all data in the object, and then processes the object. This kind of processing is required when a structural change has been made to an object, for example, when an attribute hierarchy is added, deleted, or renamed. This processing option is supported for cubes, databases, dimensions, measure groups, mining models, mining structures, and partitions. Process Data: Processes data only without building aggregations or indexes. If there is data is in the partitions, it will be dropped before re-populating the partition with source data. This processing option is supported for dimensions, cubes, measure groups, and partitions.
Process Index: Creates or rebuilds indexes and aggregations for all processed partitions. This option causes an error on unprocessed objects. This processing option is supported for cubes, dimensions, measure groups, and partitions. Process Increment: A dds newly available fact data and process only to the relevant partitions. This processing option is supported for measure groups, and partitions. Process Structure: If the cube is unprocessed, Analysis Services will process, if it is necessary, all the cubes dimensions. After that, Analysis Services will create only cube definitions. If this option is applied to a mining structure, it populates the mining structure with source data. The difference between this option and the Process Full option is that this option does not iterate the processing down to the mining models themselves. This processing option is supported for cubes and mining structures. Unprocess : Drops the data in the object specified and any lower-level constituent objects. After the data is dropped, it is not reloaded. This processing option is supported for cubes, databases, dimensions, measure groups, mining models, mining structures, and partitions. Process Default: Detects the process state of an object, and performs processing necessary to deliver unprocessed or partially processed objects to a fully processed state. This processing option is supported for cubes, databases, dimensions, measure groups, mining models, mining structures, and partitions. What is a cube? The basic unit of storage and analysis in Analysis Services is the cube. A cube is a collection of data thats been aggregated to allow queries to return data quickly. For example, a cube of order data might be aggregated by time period and by title, making the cube fast when you ask questions concerning orders by week or orders by title. What is AMO? The full form of AMO is Analysis Managament Objects. This is used to create or alter cubes from .NET code. After creating the cube, if we added a new column to the OLTP table then how you add this new attribute to the cube? Just open the datasourceview and on right click we find the option REFRESH. Click the REFRESH then it will add new attributes to the table which can be added to Cube. REAL TIME INTERVIEW QUESTIONS What is the size of the Cube in your last Project? Answer to this question varies from project to project and mainly depends on how BIG is your database and how COMPLEX the database design is. Generally for the database with a TRANSACTION TABLE of 50 crore records, the cube size will be around 100GB. So, better go with 100GB as answer to this question. What is size of the database in your last Project? You can expect this question immediately after you answer 100GB to the last question. The database size will be 600 to 800GB for which the cube will come to 100 GB. So go with 800GB for this question. What is size of the fact(Transaction) table in your last Project? This will be the next question if you answer 800GB as your dataabase size. Here he is not expecting SIZE in GBs but the interviewer will be expecting NUMBER OF ROWS in the Transaction table. Go with 57Crore records for this question. How frequently you process the cube?
You have to be very careful here. Frequency of processing cube depends on HOW FREQUENTLY YOU ARE GETTING NEW DATA. Once the new data comes then SSIS team loads it and send a mail to SSAS team after load is completed successfully. Once SSAS team receives the mail then these guys will look for best time to PROCESS. Typically we get data either Weekly or Monthly. So you can say that the processing of the cube will be done either Weekly or monthly. How frequently you get DATA from clients? This answer should be based on your last answer. IF you answered WEEKLY to last question then the Answer to this question also should be WEEKLY. IF MONTHLY for last question then this answer also should be MONTHLY. What type of Processing Options you used to process the cube in your Project? This is the toughest question to answer. This depends on DATA you have and CLIENTS requirements. Let me explain here. 1. If the database is SMALL, lets say it has only 1 crore records then people do FULL PROCESS as it wont take much time. 2. If the database is MEDIUM, lets say it has only 15 crore records then people prefer to do INCREMENTAL PROCESS unless CLIENTS ask us to do FULL PROCESS as it takes little bit of time. 3. If the database is HUGE, lets say it has more than 35 to 40 crore records then people prefer to do INCREMENTAL PROCESS unless CLIENTS ask us to do FULL PROCESS as it takes lot of time. In this case we TRY to convince clients for INCREMENTAL and if they dont agree then we dont have any other option. 4. Incremental process will come into picture ONLY when there is no updates to the OLD data i.e no changes to already existing data else NO OTHER OPTION than FULL PROCESS. How you provide security to cube? By defining roles we provide security to cubes. Using roles we can restrict users from accessing restricted data. Procedure as follows 1. Define Role 2. Set Permission 3. Add appropriate Users to the role How you move the cube from one server to another? There are many ways to do the same. Let me explain four here and cleverly you can say I worked on 4 SSAS projects till date and implemented different types in all the four. 1. Backup and restore This is the simplest way. Take the Backup from development server and copy the backup to FTP folder of clients. After doing this drop a mail to Clients Admin and he will take care of RESTORE part. 2. Directly PROCESS the cube in PRODUCTION environment. For this you need access to Production which will not be given by clients unless the clients are *********. One of the client I worked for given FULL access to me .. 3. Under Srart > All Programs > Sql Server > Analysis Services you can see deployment wizard. This is one way of moving the cube. This method has some steps to follow. First deploy your cube and FOUR files will be created in BIN folder of PROJECT folder. Copy those FOUR files and paste in Production server in any directory. Then OPEN this DEPLOYMENT Wizard in production and when it ask for Database file then point to the location where you copied the files. After that NEXT,NEXT,NEXT OK .. Cube will be deployed and processed. 4. This way is most beautiful one. Synchronization, In this we will first deploy and process the cube in STAGING ENVIRONMENT and then we will go to production server. Connect to Analysis services in
SSMS and select Synchronize by right clicking on Databases folder in SSMS of analysis services. Then select source as STAGING SERVER and then click on OK. The changes in the cube present in the Staging server will be copied to the production server. What is the toughest challenge you face in your Project? There are couple of this where we face difficulty. 1. While working on RELATIONSHIPS between Measure Groups and Dimensions. 2. Working on Complex calculations 3. Performance tuning How you created Partitions of the cube in your Last Project? Partitions can be created on different data. Few people do it on PRODUCT NAME wise and many prefer to do it on DATE data wise. you go with DATE wise. In dates, we can create MONTH wise,WEEK wise,QUARTER wise and some times YEAR wise. This all depends on how much data you are coming per WEEK or MONTH or QUARTER If you are getting 50 lakhs records per month then tell you do MONTH wise. How many dimensions in your last cube? 47 to 50. How many measure groups in your last cube? Total 10 and in that 4 are Fact tables and remaining 6 are Fact less fact tables. What is the Schema of your last cube? Snowflake Why not STAR Schema ? My data base design doesnt support STAR Schema. What are the different relationships that you are used in your cube? 1. Regular 2. Referenced 3. Many to Many 4. Fact 5. No Relationship Have you created the KPIs , If then Explain? Dont add much to this as the questions in this will be tricky. Just tell that you worked on couple of KPI and you have basic knowledge on this. (Dont worry, this is not MANDATORY) How you define Aggregations in your Project? We defined the aggregations for MOST FREQUENTLY USED data in SSRS reports. Size of SSAS team in your last Project? Just 2 guys as we guys are really in demand and lot of scarcity:) How many Resources worked on same Cube in your Project? Only 2 and one in morning shift and another in Evening shift.
How much time it take to Process the Cube? This is Very very important question. This again depends on the SIZE of database,Complexity of the database and your server settings. For database with 50 cr transaction records, it generally takes 3.5 hrs. How many Calculation you done in Your Project? I answer more than 5000 and if you tell the same then you are caught unless you are super good in MDX. Best answer for you is Worked on 50 calculations.
Share this: Like this: Twitter Facebook Google +1 Email Print
Like
Be the first to like this.
ANALYTICS
SSAS
About MDX
Multi D imensional eX pression language (MDX) is an extremely powerful tool that can allow you to fully realize the potential of your multidimensional database/cube. In this article, Ill be covering the basics of MDX and will hopefully provide you with a solid understanding of MDX query syntax, why MDX is so powerful, and how you can use MDX to add serious value to your B.I. solution. According to MSDN, the purpose of MDX is to make accessing data from multiple dimensions easier. And thats exactly what MDX does, which makes it ideal to view data broken down by multiple categories and aggregated. You may be asking yourself, Why not just write a SQL Stored Procedure and query the data warehouse? Anything I can get from the cube I can get from the data warehouse! While this is true, imagine you have been given the requirements for a report, which should include Actual Sales, Sales Goals, and Prior Period Sales broken down by Sales Region, Stores, and Month grouped by Year. Writing a stored procedure to pull the required data from the data warehouse would be large and messy and more than likely the procedure would execute very slowly simply because a data warehouse is not optimal for reporting on aggregated data. Because a cube stores your data in an aggregated format, MDX can allow you to view that aggregated data quickly and efficiently across multiple dimensions (such as Region, Stores, and Month, just to name a few) in only a few lines of code. In short, MDX can allow you build otherwise extremely complicated queries and reports with just a fraction of the code. Not only will the queries and reports be easier to build, they will execute much faster than if you had queried the same data with a SQL statement. With that said, lets go over some of the terms well need to understand before we can begin writing our first MDX query.
Basic Terms: Cube: A cube is a collection of measures, or facts, and dimensions which are based on tables and views, usually from a data warehouse. Within the cube are every possible aggregation of measures for every combination of dimensions. For example, Sales may be a measure in a cube. At any time you may want to view Sales broken down by Product. The aggregated Sales for each product are stored within the cube. This is why MDX queries against the cube are so much faster than querying the data warehouse. In the image below, you can see a graphical representation of a cube with multiple axes featuring Time, Product, and Store dimensions. Also, be aware that an Analysis Services cube is not a cube geometrically speaking. The term cube is an accepted industry standard when referring to a multidimensional database.
Dimension: A dimension organizes data in relation to a certain interest. An example of a dimension in the Adventure Works cube is the Product dimension. A dimension can be based on a table, a view, or a select statement bringing together data from multiple tables and/or views. Attribute: An attribute can be thought of as a qualitative way to describe a dimension. For example, some of the attributes of the Product dimension in the Adventure Works cube are Category, Model Name, Product, and Style, just to name a few. Measures (Facts): A measure, or fact, contains quantitative data that can be aggregated. Some of the measures in the Reseller Sales measure group in the Adventure Works cube are Reseller Tax Amount, Reseller Freight Amount, and Reseller Order Quantity. These measures can be aggregated across dimensions in our cube. For instance, it may be valuable to the end user to see Reseller Order Quantities broken down by product, region, time period, or all of the above. Measures can be viewed this way. Hierarchy: A hierarchy is a hierarchical structure of dimension attributes. Within a hierarchy are levels and within a level are members, which youll learn about in a moment. In the image below, we have an illustration of a hierarchy. You can clearly see how the levels make up a hierarchy and the members that are part of those levels.
Level: As mentioned above, in a hierarchy are levels. In the Adventure Works cube, the Fiscal Year hierarchy is made up of four levels: The top level is the Year, the second level is Quarter, the third level is Semester, the fourth level is Month,
and the final level is the Date level. When viewing aggregated measures across a hierarchy, we would be able to see that measure aggregated up to each member in that level. Member: A member is a value in a level. An example of a member of the Month level of the Fiscal Year hierarchy in the Adventure Works cube would be March. Axis: An axis can be thought of as a line on which a dimension rests. This line and the dimension intersect our cube and measures. An MDX query can have up to 128 axes. Only 5 are aliased.
At u p l e
( [ D a t e ] . [ F i s c a lY e a r ] . & [ 2 0 0 6 ] ,[ S a l e sR e a s o n ] . [ S a l e sR e a s o n ] . & [ 9 ] )
Set: A set is a collection of tuples. In the example of a set (seen below), I have two tuples within the set. The dimensionality of tuples within a set must be the same or your query will throw an error. In other words, the dimensions used in the tuples of a set must be the same and in the same order. A set must be wrapped in curly brackets and each tuple in the set should have parentheses around it.
Ac o l l e c t i o no ft u p l e s
( [ D a t e ] . [ F i s c a lY e a r ] . & [ 2 0 0 6 ] ,[ S a l e sR e a s o n ] . [ S a l e sR e a s o n ] . & [ 9 ] ) ,
( [ D a t e ] . [ F i s c a lY e a r ] . & [ 2 0 0 6 ] ,[ S a l e sR e a s o n ] . [ S a l e sR e a s o n ] . & [ 5 ] )
MDX Query Syntax Here we have a basic MDX query. Lets take a look at each piece of this query so we can understand the necessary
parts to a basic MDX statement. When you first look at this query, you may think that it appears awefully
similar to T-SQL, but that would be where the similarities end. The MDX query statements function very differently compared to T-SQL and it is important to remember that when you are learning to write MDX. The first thing your MDX query will need is a SELECT statement. The Select statement in an MDX query is very different from a SELECT statement in a SQL query. In T-SQL, the SELECT statement can only define the columns layout of your query results. In an MDX query, the SELECT statement defines the lay out of multiple dimensions along up to 128 different axes. In our example query seen above, you can see that I have specified the Date dimension and Fiscal Year attribute be displayed along the Row axis and the Internet Order Quanity measure be displayed along the Column axis. The second thing your MDX query will need is a FROM statement. The FROM statement specifies the context of your query, or which cube you wish to query. In T-SQL, you can use the FROM statement to join multiple tables from multiple databases, but in MDX there is no joining of cubes together so dont even think about it. The last piece of this MDX query is the WHERE statement. The WHERE statement limits the results of our MDX query by specifying what is known as a slicer dimension. You can see in in this example that I am restricting the results of the MDX query to only the 1 member (which is actually the Bikes category) of the Category attribute of the Product dimension. Basic Queries To Get Us Started So now that weve learned a little bit about MDX, have covered some of the necessary terminology, and understand the MDX query syntax, lets start writing some MDX. Important Note: All of the examples in this article are for use with the Adventure Works 2008 Analysis Services database, which can be downloaded from here. Once the database is downloaded and installed, open the Analysis Services project (located by default in c:Program FilesMicrosoft SQL Server100ToolsSamples) in BIDS. Make sure you change the data source connection to the instance of SQL Server where you installed the sample database. Then deploy the project. Open SQL Server Management Studio. When the Connect to Server dialogue window pops up, make sure to select Analysis Services in the drop down list next to Server type. Specify the correct server where the Adventure Works AS database is at and click Connect. Navigate to the Analysis Services database called AdventureWorksCube. Right-click, select New Query, and select MDX.
Now that the MDX query editor window is open, we can start writing some MDX. The first query is very simple and its purpose is to show us the default measure of the Adventure Works cube.
S E L E C T
F R O M[ A D V E N T U R EW O R K S ]
Results:
If you execute the query seen above, youll see a single number returned. While it would appear that we have not defined a measure or dimension in our query, the number being returned by the query is actually the default measure specified in the Adventure Works cube, which is the Reseller Sales Amount. Since no dimension has been specified, this query returns the sum total of the Reseller Sales Amount across all products, all time, all regions, etc. So unless you explicitly specify in your query which measure you want to see, the measure that is going to be returned is the default measure Reseller Sales Amount. One way we could explicity request a different measure is to use the Where statement and specify a slicer.
1 S E L E C T
F R O M[ A D V E N T U R EW O R K S ]
W H E R E[ M e a s u r e s ] . [ R e s e l l e rT o t a lP r o d u c tC o s t ]
Results:
Because we have used the Where statement to limit the scope of the query to the Reseller Total Product Cost, this query will return the sum total of the Reseller Total Product Cost, instead of the Reseller Sales Amount, across all dimensions. Now lets specify some dimensions by which to slice and dice our cube with.
1 S E L E C T[ D a t e ] . [ C a l e n d a r ] . [ C a l e n d a rY e a r ]O NC O L U M N S ,
[ P r o d u c t ] . [ P r o d u c tC a t e g o r i e s ] . [ C a t e g o r y ]O NR O W S
F R O M[ A D V E N T U R EW O R K S ]
Results:
Now that we have specified a dimension on both the Columns axis and the Row axis, we are starting to realize the power of the cube. Compare the above MDX query to a SQL query that would return similar results and you will see one of the reasons why MDX (and Analysis Services) is so powerful. Can you guess what measure is being displayed in our query results? If you said the Reseller Sales Amount, that would be right. Once again, because we have not explicitly specified which measure to bring back, the default measure Reseller Sales Amount will be returned. For our last query, lets bring it all together and specify a dimension and attribute on each axis and a slicer dimension:
1 S E L E C T[ D a t e ] . [ C a l e n d a r ] . [ C a l e n d a rY e a r ]O NC O L U M N S ,
[ P r o d u c t ] . [ P r o d u c tC a t e g o r i e s ] . [ C a t e g o r y ]O NR O W S
F R O M[ A D V E N T U R EW O R K S ]
W H E R E[ M e a s u r e s ] . [ R e s e l l e rT o t a lP r o d u c tC o s t ]
Results:
The above query is exactly the same as the previous query except now we have explcitly limited the scope of the query to the Reseller Total Product Cost.
Share this: Like this: Twitter Facebook Google +1 Email Print
Like
Be the first to like this.
ANALYTICS
DWBI
SSAS
LEAVE A COMMENT
OVERVIEW
SQL Server Analysis Services (SSAS) is the technology from the Microsoft Business Intelligence stack, to develop Online Analytical Processing (OLAP) solutions. In simple terms, you can use SSAS to create cubes using data from data marts / data warehouse for deeper and faster data analysis. Cubes are multi-dimensional data sources which have dimensions and facts (also known as measures) as its basic constituents. From a relational perspective dimensions can be thought of as master tables and facts can be thought of as measureable details. These details are generally stored in a pre-aggregated proprietary format and users can analyze huge amounts of data and slice this data by dimensions very easily. Multidimensional expression (MDX) is the query language used to query a cube, similar to the way T-SQL is used to query a table in SQL Server. Simple examples of dimensions can be product / geography / time / customer, and similar simple examples of facts can be orders / sales. A typical analysis could be to analyze sales in Asia-pacific geography during the past 5 years. You can think of this data as a pivot table where geography is the column-axis and years is the row axis, and sales can be seen as the values. Geography can also have its own hierarchy like Country->City->State. Time can also have its own hierarchy like Year->Semester->Quarter. Sales could then be analyzed using any of these hierarchies for effective data analysis. A typical higher level cube development process using SSAS involves the following steps: 1) Reading data from a dimensional model 2) Configuring a schema in BIDS (Business Intelligence Development Studio) 3) Creating dimensions, measures and cubes from this schema 4) Fine tuning the cube as per the requirements 5) Deploying the cube In this tutorial we will step through a number of topics that you need to understand in order to successfully create a basic cube. Our high level outline is as follows:
Design and develop a star-schema Create dimensions, hierarchies, and cubes Process and deploy a cube Develop calculated measures and named sets using MDX Browse the cube data using Excel as the client tool When you start learning SSAS, you should have a reasonable relational database background. But when you start working in a multi-dimensional environment, you need to stop thinking from a two-dimensional (relational database) perspective, which will develop over time. In this tutorial, we will also try to develop an understanding of OLAP development from the eyes of an OLTP practitioner. Creating a Sample SSAS Project and Cube Data in Online Transaction Processing (OLTP) systems is suited to support convenient data storage for userfacing applications. The data model in such systems is highly normalized. For data warehousing environments, data is required to be in a schema that supports a dimensional model. Data is therefore transformed from the OLTP storage systems to a data warehouse using ETL, so that data can be aligned in a suitable format to create data marts from the data warehouse. Two major theories driving the design of a data warehouse and data marts are from Ralph Kimball and Bill Inmon which are mostly practiced in real time environments. Generally data is gathered from OLTP systems and brought to the data warehouse. From the data warehouse, context / requirement specific data marts are created, which can be perceived as a subset of the data warehouse. Cube source data from these data marts, and client applications connect to the cube. The schema for a cube falls into two categories: Star and Snowflake. In simple terms, Star Schema can be considered a more denormalized form of schema compared to Snowflake. Designing and developing a data warehouse is out scope for this tutorial. For the purpose of development, we will install and use the AdventureWorks DW database. We will then create a SSAS project and create a data source which will connect to this database. Finally we will create a star schema using a Data Source View. Installing AdventureWorks Sample Database
OVERVIEW
AdventureWorks is the sample database available from Microsoft for different purposes as well as different SQL Server versions. We need to use the AdventureWorks DW 2008 R2 database for our cube design and development. This database contains dimension and fact tables with prepopulated data. We can use this database as a launchpad to start our SSAS project. Developing a data mart is out of the scope of this tutorial, so we will use this sample database.
EXPLANATION
To install the AdventureWorks database, navigate to the codeplex (http://msftdbprodsamples.codeplex.com/) site and download the MSI for the version of SQL Server you are using. This tutorial expects that the reader is using SQL Server 2008 R2, and all the exercises will be using this version of SQL Server. After downloading, start the installer and you should get a screen similar to the one below.
AdventureWorks Data Warehouse 2008R2 is the database we need for our exercises. Point the installer to the SQL Server instance that you are using, and install the database. After the database in installed, open SQL Server Management Studio to verify the databases that were installed. You should find something similar to the below screenshot.
Expand the database higlighted above and check out the different Dim and Fact tables in this database. The tables having the prefix Dim are suited to be used as Dimension tables, and tables having prefix Fact are suited to be used as Fact tables.
OVERVIEW
To start development, we need to create a new SSAS project using Business Intelligence Development Studio. After creating the new project, we need to create a data source that points to the AdventureWorks DW 2008 R2 database.
EXPLANATION
Open Business Intelligence Development Studio (BIDS). Create a new SSAS Project, by selecting New Project from the File menu. Name this project MyOLAPProject. As soon as the new project opens up, you should find a list of folders in the explorer tab. Right-click on the data sources folder and select New DataSource. A Data Source wizard will open with a Welcome screen, select Next and you should find a screen to define your connection. We need to define a new connection, so select New and a screen should appear as shown below. Point the connection to theAdventureWorksDW2008R2 database and click OK.
After this, you need to specify the impersonation information for the data source. This information is used to specify how the solution will connect to the SSAS instance using the credentials specified. Every time you deploy or process the solution, this connection information will be used. So keep in mind that the account you use should have sufficient privileges. If you are not sure which account to use, it is suggested that you use an account with administrator privileges on your development machine. Please keep in mind that this is not recommended and should not be done in production environments. This is just suggested to quickly get you started with cube design and development.
After specifying this information, click Next. This should take you to the final screen where you need to name the data source. Name it something appropriate and click OK, which should create your data source. Creating a Star Schema Using a Data Source View
OVERVIEW
A data warehouse or data mart from where we would source our data could contain ten to hundreds of tables. Also one would not have the liberty to change the schema of these tables to suit the requirements of the cube design. The Data Source View is an insulation layer between the actual data source and the solution. We can create and modify the schema we need in this layer and this is used as the data source for the different objects we create in the solution. A Star Schema is a schema structure where different dimension tables are directly connected to the fact table. If you imagine a fact table in the center and different dimensions attached to it, you would find the figure similar to a star and hence the name star schema. Its the simplest form of the schema and hence we will use this in our exercise.
EXPLANATION
Right-click on the Data Source View and select New Data Source View and a wizard should pop-up with a Welcome screen. Select Next, and the next screen should prompt you to select a relational data source. Select the data source we just created and click Next, the next screen should prompt you to select tables that we intend to use in our solution. Select the tables as shown in the below screenshot. The below fact and dimension tables are chosen as they are interlinked with each other and also suits the requirements of the exercises to follow.
Select Next, name the DSV to something appropriate and this should finally create your Data Source View. After arranging the tables in the DSV, your schema should look similar to the below screenshot.
In the above figure, you can see that both the fact tables are related to all three dimensions in the same manner. This is a typical case of a star schema. You can also browse the data, create calculated fields, assign primary keys and carry out other similar function in this designer to modify the schema without modifying the actual schema in the database. Designing a Cube
USING BIDS, AFTER THE DSV IS DEVELOPED, THE NEXT STEP IS TO CREATE DIMENSIONS. DIMENSIONS ARE OF TWO TYPES: DATABASE DIMENSIONS AND CUBE DIMENSIONS. DATABASE DIMENSIONS CAN BE PERCEIVED AS A MASTER TEMPLATE, AND CUBE DIMENSIONS CAN BE PERCEIVED AS INSTANCES / CHILDREN OF THIS MASTER TEMPLATE.
We will start our development with the creation of database dimensions. If you consider a dimension as a table, all the fields in this table can be perceived as attributes. Hierarchy in a dimension is a group of attributes logically related to each other with a defined cardinality. Finally we will create a cube using the dimensions we just developed, and fact tables to create dimensions (cube dimensions) and measure groups (from fact tables). Creating a Dimension Dimensions are of two types: database dimension and cube dimension. The dimensions that are defined at the solution level can be termed as a database dimension and the ones defined inside the cube are termed as a cube dimension. Dimension Wizard is the primary means of creating a dimension. We will create a dimension using the three dimension tables which we have included in our schema.
EXPLANATION
Right-click the Dimensions folder and select New Dimension, this will invoke the Dimension Wizard. The first screen should look like the below screenshot. You have the options of using an existing table, creating a table in the data source and using a template. We already have the dimension table in our schema and we will use this, so select Use an existing table and click Next.
Select the DSV we created earlier in the DSV selection. We intend to create a dimension from the DimSalesTerritory table, so select the same table. Every dimension table needs to have a key attribute, and in this table SaleTerritoryKey is the primary key column which is guaranteed to identify each record uniquely. It would not make sense to browse this attribute using the Key, instead SalesTerritoryRegion field has unique values. We can also use this field as the key as well as name column. But for the purpose of our exercise, we will use the SaleTerritoryKey field as the key column and SalesTerritoryRegion as the name column. Though it looks inappropriate to use the key field, but when you are starting to develop an understanding of dimensions, this will help to set a rule in your mind that the key field is always required, mostly a surrogate key and you can set a name column to any field to facilitate a convenient browsing mechanism.
In the next screen, you need to make a selection of the attributes that will be present in the dimension. If you uncheck the Enable Browsing button, they wont be visible to client applications when they browse the dimension. Attributes can be of different types and you can specify the type in the Attribute Type field. The Dimension Wizard removes the Name column you set from the key column as that is available due to the key
column. So you wont find that field in this list of available attributes.
Now the next step is to give a name to the dimension, name it Cube Dim Sales Territory or anything appropriate. After this step you have completed creating your first dimension.
In a similar manner create Product and Date dimension using the Dimension Wizard. Creating a Hierarchy A Hierarchy is a set of logically related attributes with a fixed cardinality. While browsing the data, a hierarchy exposes the top level attribute which can be broken down into lower level attributes. For example, Year -> Semester Quarter Month is a hierarchy. While analyzing the data, it might be required to drill down from a higher level to a detail level, and exposing data as a hierarchy is one of the best solutions for this.
EXPLANATION
Creating a hierarchy is as easy as dragging and dropping attributes in the hierarchy pane of the dimension editor. We want to create a hierarchy in the Sales Territory dimension. Open Sales Territory dimension in the dimension editor, drag and drop attributes in the hierarchy pane, click on each of them and rename them to something appropriate. After completing this, your hierarchy should look similar to the below screenshot.
You will find a warning icon on the hierarchy pane, which says that attribute relationships are missing between these attributes. Country has a one-to-many relationship with Region, and Group has a one-tomany relationship with Country. But these relationships need to be defined explicitly in the dimension. Click on Attribute Relationships tab, right-click the region attribute and select New Attribute Relationship. Set the values as shown in the below screenshot to correct the relationships between these attributes.
After you have applied the above changes, your attribute relationship tab should look like the below screenshot.
If you have observer carefully, relationship types are of two types: Rigid and Flexible. This has an effect on the processing of the cube. Rigid means that you do not expect the relationship to change and Flexible means that relationship values can change. In our dataset, Group is a logical way to categorize countries and it can change, while regions within country have limited or no change. So the relationship type between country and group should be flexible and relationship type between region (sales territory key) and country should be rigid. Double click on the arrow joining Key attribute and Country, and change the relationship type as shown below.
Check out the Hierarchy pane, and you should find that the warning icon is no longer visible. You can change the name of the hierarchy to something appropriate. In the interest of beginners who might get confused with the distinction between attributes and hierarchy, we will keep the name as Hierarchy. Edit the Date dimension, and create a Year Semester Quarter Month hierarchy in the date dimension. Creating a Cube using the Cube Wizard A Cube acts as an OLAP database to the subscribers who need to query data from an OLAP data store. A
Cube is the main object of a SSAS solution where the majority of fine tuning, calculations, aggregation design, storage design, defining relationship and a lot of other configurations are developed. We will create a cube using our dimension and fact tables.
EXPLANATION
Right-click the Cube folder and select New Cube, and it will invoke the Cube Wizard. In the first screen you need to select one of the methods of creating a Cube. We already have our dimensions ready, and schema is already designed to contain dimension and fact tables. So we will select the option of Use existing tables.
In the next screen, we need to select the tables which will be used to create measure groups. We already have a DSV which has fact tables in the schema. So we will use this as shown in the below screenshot.
In the next screen, we need to select the measures that we want to create from the fact tables we just
selected in the previous screen. For now, select all the fields as shown below and move to the next screen.
In this screen you need to select any existing dimensions. We have created three dimensions and we will include all of these dimensions as shown below.
In the next screen, we can select if we want to create any additional new dimensions from the tables available in the DSV. We do not want to create any more dimensions, so unselect any selected tables as shown below and move to the next screen.
Finally you need to name your cube, which is the last step of the wizard before your cube is created. Name it something appropriate like Sales Cube as shown below.
Now your cube should have been created and if your cube editor is open you should find different tabs to configure and design various features and aspects of the cube. If you look carefully in the below screenshot, you will find FactInternetSales and FactResellerSales measure groups. Also you will find Sales Territory and Product dimension, but Date dimension is missing. Both fact tables have multiple fields referencing the DateKey from the Date dimension. BIDS intelligently creates three dimensions from the Date dimension and names them to the name of the field which is referenced from the Date dimension. So you will find three compounds of Date dimension Ship Date, Due Date and Order Date dimensions. These are known as roleplaying dimensions.
OVERVIEW
Once the cube design and development is complete, the next step is to deploy the cube. When the cube is deployed, a database for the solution is created in the SSAS instance, if not already present. Each of the dimensions and measure group definitions are read, and data is calculated and stored as per the design and configuration of these objects. Once the cube is successfully deployed, client applications can connect to the cube and browse the cube data. We will deploy the cube we have developed and test connecting to the cube. We might also face errors during deployment, and we will attempt debugging and resolving these errors. Debugging Deployment Errors In a development environment, ideally you would come across errors during deployment and processing of the cube. Debugging errors is an essential part of the cube development life cycle. We will configure the deployment properties and we should face some errors during the deployment. We will then analyze and resolve these errors.
EXPLANATION
Right-click the solution and select Properties, this would bring up a pop-up window. Select the deployment tab and it will bring up the deployment properties. Mention the SSAS server name and the database name that was created for your solution in the SSAS instance. Since SSAS in installed on my local / development machine, I have chosen server as localhost and name of the database as Sales. We will keep the rest of the options as default for now.
Right-click the solution and select Deploy, this will start deploying the solution. If you have not specified an appropriate account in the impersonation information, your deployment might fail as the account might not have sufficient privileges. If you have followed all the previous steps as explained, you should face errors as shown below. From the error message you can make out that cube processing failed due to the Date dimension. Right-click the Cube Dim Date dimension and select Process, and you would find the following error.
If you recall we have defined a hierarchy in the Date dimension, Year -> Semester -> Quarter -> Month, and the attribute relation expected is one to many. If you browse the data, you will find that the same set of semester values exist in each year, so how do you make them unique for each Quarter? When the Quarter is processed, it will find duplicate Semester as the key columns for the Semester is Semester itself by default which is not unique. So we need to make each attribute unique by changing its key columns.
Edit the Date dimension in the dimension editor, select the Semester attribute and edit the Key Columns property. This should bring up a pop-up window as shown below. To make the Semester attribute unique, we need to make the key column a composite key Year + Semester to make it unique. So select key columns as shown below.
When you select multiple columns in the key column, the name column property becomes blank and its a mandatory property. So select this property and set it again to Semester as we want to display semesters when this is browsed.
This should solve the error we were facing on the date dimension. Duplicate keys are one of the most common errors during dimension processing and we just learned how to resolve this issue. Processing Dimensions and Cube SSAS provides various cube processing methods and options to configure error logging as well as impact on processing when errors are encountered. We will briefly look at these options, understand what processing of the cube means, deploy our cube and try to access data from the cube.
EXPLANATION
Right-click on the dimension or cube and select Process, and this should bring up a similar screen with processing options as shown in the below screenshot. Various processing options are visible in the dropdown. Unprocess would remove all the aggregation created by the processing of the object. Process Full would also do the same operation, but also create all the aggregations again. More reference about these options can be found in MSDN BOL. In the Change Settings and Impact Analysis options you will find more error configuration and other options related to processing.
Deploy the cube and the cube should be deployed successfully. Go to the Browser pane after successful deployment, and try to connect to the cube and browse data by dragging and dropping dimension attributes and measures on the browsing area. Below is an example.
Calculated Measures and Named Sets Fields from fact tables get converted into measures in measuregroups in a cube. When measuregroups are created in a cube, one measuregroup is created per fact table. Often in production systems, developing calculated measures is a regular requirement. Multi-Dimensional Expressions (MDX) is the query language for a cube and is synonymous to what T-SQL is to SQL Server. Often queries that are frequently used are required to be in some ready format in a cube, so that the users do not need to develop them over and over again. One of the solutions for this is named sets, which can be perceived as a query already defined in the cube, similar to views in SQL Server. We will develop a calculated measure and a few named sets in this section. Developing a Calculated Measure Measures created directly from the fields of a fact table are called base measures. But often we require measures based on custom requirements, so we apply some logic and/or formula to these base measures and create calculated measures. We will add two measures from two measure groups and create a calculated measure.
EXPLANATION
Open the cube designer, and click on the Calculations tab. Click on New Calculated Measure from the toolbar, and key in the values as shown in the below screenshot.
We have named this new calculated measure TotalSales. The Parent hierarchy specifies which parent hierarchy the measure will be part and in this case it will be Measures. Its a built-in hierarchy and all measures normally fall under this. In the Expression, we can specify any MDX expression. Here we are adding Internet Sales Amount from FactInternetSales and Reseller Sales Amount from FactResellerSales measure groups. You do not need to type the values you can just drag and drop values from the panes on the left-hand side of the window. In the additional properties you can set additional options for this measure. Save your solution, in the next section we will create named sets and then deploy these at the same time. Developing Named Sets Named sets return a dataset based on defined logic. They are primarily useful to create datasets that are often requested from the cube. Named sets are of two types: Static and Dynamic. The difference between these two is that static named sets are calculated when they are requested the first time in a session and dynamic named sets are calculated each time a query references it. In this section we will look at how to create dynamic named sets. Note that dynamic named sets were not introduced until SQL Server 2008.
EXPLANATION
Open the cube designer, and click on the Calculations tab. Click on New Named Set from the toolbar and key in the values as shown in the below screenshots.
Here we are creating two named sets, Internet Sales Top 25 and Reseller Sales Top 25. In these named sets, we are returning the Top 25 products based on Internet Sales and Reseller Sales. In this formula, TopCount, the MDX function returns top 25 records from the dataset. In the Type selection, we can select whether we want the named set to be static or dynamic. We have selected Dynamic as we want to create a dynamic named set. In the Display folder selection, we can specify where the named sets will appear. By default named sets appear in the last dimension that is used in the formula. Here we have used an attribute hierarchy from Product dimension, so the named sets should appear in the same dimension under Named Sets directory. Save and deploy the solution, and then re-connect to the cube in the Browser pane. You should be able to see the calculated measure and named sets as shown in the below screenshot.
Browsing a Cube Using Excel Once the cube is deployed and ready to host queries from the data store, client applications can start querying the cube. One of the most user friendly client tools for business users to query a cube is Microsoft Excel. It has a built-in interface and components to support GUI based connection, querying and formatting of data sourced from a cube. Business users can use the familiar interface of Excel and create ad-hoc pivot table reports by querying the cube without any detailed knowledge about querying a multi-dimensional data source. We will connect to the cube we just created using Excel and develop a very simple report using the cube data. Using Excel and Creating a Pivot Table Report
OVERVIEW
We will first create a connection to the cube we have developed in the previous exercises. After connecting the cube we will use the calculated measures and a named set to create a very basic pivot table report. For the purpose of demonstration, Excel 2010 is used and is installed on the development machine, but you can also use Excel 2007 to connect to the cube.
EXPLANATION
Open Microsoft Excel and select the Data tab from the menu ribbon. Click on From Other Sources and select From Analysis Services option as shown in the below screenshot.
In the next step specify the SSAS server name and logon credentials. If you have everything on the local machine, you can also use localhost as the server name.
If you were able to successfully connect to the specified SSAS instance with the logon credentials specified, in the next step you should be able to select the SSAS Sales database and find the Sales Cube. Select the Sales Cube and proceed to the next step.
In the next step, specify the name of the connection file to save. This file will be saved as an .ODC file and you can reuse this connection file when you want to use the same connection in other workbooks.
After saving the file, you will be prompted with the option to select the kind of report you want to create. We will go with the default option and select PivotTable Report.
After selecting PivotTable Report, a designer will open with options to select dimension, attributes and measures to populate your pivot table. Select the values as shown in the below screenshot. Our intention is to display the hierarchy we created in the Sales Territory dimension on the columns axis, Internet Sales Top 25 named set on the rows axis, and the Total Sales calculated measure in the values area.
After making the above selections, your report should look like the below screenshot. Using the features available from the Options tab, you can format this report and give it a more professional look. You can try drilling down the hierarchy, but you will see that you need to develop the hierarchies. Users who frequently want to see sales of products to top customers, can pick up any named-set that we defined earlier. Instead of having users define formulas for adding internet sales and reseller sales, users can just select Total Sales.
SQL Server Analysis Services Glossary Following is a list of common terms when working with SQL Server Analysis Services. Cube - Cube is a multi dimensional data structure composed of dimensions and measure groups. The intersection of dimension and measure groups contained in a cube returns the dataset. Calculated Measure - Each field in a measure group is known as a base measure. Measures created using MDX expressions with/without base measures are known as calculated measures. Data Source View - Its an insulation layer that inherits the basic schema from the data source with the flexibility to manipulate the schema in this layer without modifying the actual schema in the data source. Dimension - Dimension is an OLAP structure that is basically used to contain attributes related to an entity to categorize data on the row / column axis. A dimension almost never contains measurable numeric data, and if at all it contains, it is used as an attribute. Typical example of dimensions are Geography, Organization, Employee, Time etc. Fact - Fact known as a Measure Group in a cube, is an OLAP structure that is basically used to contain measureable numeric data, for one or more entities. In cube parlance these entities are known as Dimensions. A dimension need not be necessarily associated directly with a fact, but a fact is always associated directly with at least one dimension. Typical example of facts are Sales, Performance, Tax etc. Hierarchy - Hierarchy is collection of nested attributes associated in a parent-child fashion with a defined cardinality. Dimension is formed of attributes, and hierarchy contained in a dimension is formed of one or more attributes from the same dimension.
KPI - Key Performance Indicators are logical structures defined using MDX expressions. Each KPI has a goal, status, value, trend, and indicator associated with it. Value is derived based on the definition of KPI, all the rest of these values vary based on this derived value. KPIs are the primary elements that makes up a scorecard in a dashboard. MDX - Multi Dimensional Expressions is considered as the query language of multi dimensional data structures. This can be considered as the SQL of OLAP databases, with the major difference that MDX is mostly used for reading data only. Named Set - Named Set is a pre-defined MDX query defined in the script of the cube. It can be thought of synonymous to Views in a SQL Server database. Named sets can be dynamic or static and this nature defines the time when this query gets evaluated. OLAP - Online Analytical Processing is a term used to represent analytical data sources and analysis systems. The fundamental perception and expectation associated with the term OLAP is that it would contain multi dimensional data and the environment hosting the same. Snowflake Schema - Snowflake schema is an OLAP schema, where one or more normalized dimension tables are associated with a fact table. For example, Product Sub Category -> Product Category -> Product can be three normalized dimension tables and Product table can be associated with a fact table like Sales. This is a very common example of a snowflake schema. Star Schema - Star schema is an OLAP schema, where all dimension tables are directly associated with fact tables, and no normalized dimension tables are considered in the schema. For example, Time, Product, Geography dimension tables would be directly associated with a fact table like Sales. This is a very common example of star schema.
Share this: Like this: Twitter Facebook Google +1 Email Print
Like
Be the first to like this.
ANALYTICS
DWBI
SSAS
LEAVE A COMMENT