office (715) 262-3601
The goal of this seminar is to help you get started with OLAP, so you learn how to bring your data from a relational database into cubes. The goal of this first session is to describe the structure of OLAP data.
In the second session we will be describing how to find the OLAP structures in a relational database. But before we can do that we have to describe the OLAP structures themselves – what are we trying to create in an OLAP database?
This seminar was originally based on Analysis Services in SQL Server 2000. It has been updated to include information from Analysis Services in SQL Server 2005.
If you haven’t ever browsed an OLAP cube, you need to experience what it’s like. Try the cubes on this dashboard.
OLAP presents data in a different way. You need to be familiar with the look and feel of OLAP data. You also need to be familiar with some OLAP concepts. The main content of this session is a definition/description of 30 OLAP terms. Many of these concepts are needed by everybody who uses OLAP. Others are only important for people who are developing an OLAP system.
If you’re just starting out with OLAP, try browsing a cube as you’re reading these definitions. You need to see how dimensions, levels, measures, etc. actually look to an OLAP end-user.
Here is my Top 30 List of OLAP Concepts. For each one I give a short-hand definition (which is highlighted) and a lengthier description.
OLAP
Browsing
Data Mining
Cube
Dimension
Level
Drilling Up
Drilling Down
Drill-Through
Hierarchy
Member
Set
Member Property (Attribute)
Child Members
Slicing
Dicing
Tuple
Measure (Fact)
Calculated Measures
Cell
Current Member
Dimension Table
Fact Table
Star Schema
Snowflaking
MDX
Unbalanced Hierarchy
Ragged Hierarchy
Actions
Local Cube
OLAP (The Acronym) – OnLine Analytical Processing – The logical meaning of OLAP would seem to include any computer application that is used to analyze data, but nobody really uses it like that. OLAP is one of the worst acronyms I have ever seen – when you tell someone what the letters stand for, you still don’t have the slightest idea what the software actually looks like.
OLAP (Popular Definition) - A Million Spreadsheets in A Box – This is what you can tell people about OLAP who have never seen it before. OLAP is basically a spreadsheet tool – pretty powerful and flexible – but its basic purpose is to show spreadsheets. The key for OLAP is the ability to navigate to different views of the data. You don’t have to ask your technical people every time you want to see your data in a new way. Your OLAP tool allows you to move quickly and easily from one perspective to another. Do you see some information that interests you? With OLAP you can look at that data from a more detailed (or a more general) perspective.
OLAP (Another Popular Definition) - Reporting With Something Extra - A standard paper-based executive report shows two dimensions - a cost center displayed on each column, an account displayed on each row, with dollar figures in the middle. An OLAP cube adds extra dimensions - trends over time, accounts displayed for different geographical areas or for different products. You can easily move among different perspectives and between a more detailed or a more general perspective.
OLAP (Technical Definition) - Fast, Interactive Browsing of Multidimensional, Multi-Level Data –
OLAP always involves multiple dimensions, which should have multiple levels. (If you don’t have levels, your OLAP browsing doesn’t have much power.) OLAP has to be fast.
OLAP is interactive. OLAP browsing is something done by a human analyst. As the analysts view the data, they can ask new questions of the data and receive immediate answers.
If you have a data analysis application that doesn’t return results of new queries (almost) immediately, you don’t have OLAP. How fast is fast enough? Less than 5 seconds is OK, but less than 1 second is a lot better. The value of Analysis Services is that it makes this fast query response possible. The data is organized in multidimensional structures, with some of the summary (aggregated) views in the data calculated ahead of time.
If I had my way, we would throw the term OLAP away and replace it with something like Interactive Spreadsheeting. Or maybe we should replace the term “OLAP Browsing” with “Data Browsing”.
Data Mining – Computer Analysis of Data that is Not Interactive – You can use Analysis Services for both OLAP and data mining. Both are methods of analyzing data. The big difference is that OLAP is used interactively. A person browses OLAP data to find the significant information. Data mining is a process where the computer analyzes data and then reports the significant results back to the analyst.
(NOTE: There is at least one Analysis Services client tool that allows a user to switch back and forth between OLAP and data mining. The user can start out with data mining and then analyze the significant findings with OLAP. Insights gained from OLAP browsing can then be used for additional data mining.)
Cube – A Multidimensional Way to Look at Data – The cube is the primary OLAP structure used to view data. It is analogous to a table in the relational database world. The term cube implies three dimensions, but OLAP folk have stretched the concept a little bit. An Analysis Services cube can have 128 dimensions (though it’s probably not a good idea to have that many).
Dimensions (Definition #1) - The Perspectives Used for Looking at the Data – Dimensions are the answer to the question “How do you want to see your data?” Here are some examples of dimensions –
Product
Time
Store
Customer Age
Customer Income
Employee
Dimensions (Definition #2) - The Categories You Use for the Columns and Rows in the Spreadsheet - Do you want to see how different products have done in different time periods? Put the different products on the columns and the different time periods on the rows (or the other way around). How about looking at the age of customers who are shopping at different stores? Use a different row for each age group and a separate column for each store.
Dimensions Have Levels Which Are Used for Drilling Down and Drilling Up – If you just had dimensions, OLAP browsing would be kind of dull. The value of OLAP comes in having good levels for your dimensions. Levels let you see the general view of things and the detailed view of things. If you notice that sales are higher in a particular month, you will maybe want to drill down to see if they were higher in a particular part of the month. You also might want to drill up to a higher level, to see if the data patterns are valid on a wider scale.
Levels are Organized Into Hierarchies – We should really say that dimensions have hierarchies and it’s the hierarchies that have levels. But many dimensions have only one hierarchy, and when they do, that hierarchy can be ignored, for the most part. But there are times when a single dimension has more than one hierarchy – such as when the levels of a Time dimension are organized by Calendar Year and by Fiscal Year.
Note: Some OLAP tools don’t handle hierarchies very well. You might not see them in the one you’re using, or they might just appear as if they were separate dimensions.
Levels Have Members – the Labels for the Columns and Rows in the Spreadsheet – If you have a dimension called Time, one of the Levels might be called Month, and the members of the level Month would be the members January, February, March, etc. If you have a dimension called Product, you could have a level called Department, and the members of the level Department could be Hardware, Plumbing, Lumber, etc.
Drilling Down to Child Members – Drilling down is what you do when you’re looking at data at a particular level and you want to see more detailed data. When you drill down to see the details at the next lower level, the members you see are called child members. For example, if you drill down on Quarter 1, you will see the child members January, February, and March.
A whole variety of family terms are used to describe the relationship between members within a dimension – parent, sibling, cousin, descendants, ancestors, etc.
Drill-through - Jumping from OLAP Back to the Source Data - In OLAP you can drill up and drill down to view different levels in the data. Some OLAP systems also allow you to jump back to the source data. You can select data you find interesting in the cube and drill through to the source data to view extra detail.
A Collection of Members is Called a Set - You nearly always want to see more than one column and more than one row in a spreadsheet. The group of members that is shown on the columns and the group of members that is shown on the rows are called sets. Sets can be defined in OLAP in a large number of different ways. Here are a few examples
1. By listing each member – {January, February, March}
2. By a family relationship – the children of Quarter 1 – [Quarter
1].Children
3. All the members of a level – Time.Month.Members
4. All the members of a dimension – Time.Members
Members Have Extra Information Called Member Properties (or Attributes) – You can display extra information about members of any level by creating member properties. An Employee level could have member properties such as Phone Number, Birth Date, and Hire Date. A City level could have a member property Population, which could be used to calculate per capita sales.
OLAP Filtering Is Called Slicing – You can use a dimension for slicing when it isn’t being displayed on the columns or rows of the spreadsheet. You can slice on any member from any level of the dimension. If you slice on the member January in the Month level of the Time dimension, you will only see data from January.
Dimensions Can Be Combined For Dicing – You can put more than one dimension on the rows or on the columns. You will see one row for every combination of the members from each of the dimensions. If you put the children of Quarter 1 from the Time dimension and the members of the Product Family level of the Product dimension on the rows, you would have nine rows, as follows:
January Food
January Drink
January Non-Consumable
February Food
February Drink
February Non-Consumable
March Food
March Drink
March Non-Consumable
When you are cutting the data with one member it’s called slicing. When you cut your data with sets of members from two or more dimensions, it’s called dicing, like you’re cutting up vegetables from every direction.
The Members Defining a Row (or a Column) Are Called a Tuple – In geometry, a point is defined by its x-y coordinates (x, y). You can think of x and y as being members of the x-dimension and the y-dimension. In OLAP, each row or column is defined by the members that are used for that row. The first row in the previous example is defined by the tuple (January, Food).
A Group of Tuples Are Called a Set – The 9 tuples in the example are also called a set. Before I said that a group of members make a set. That’s true, sort of, but it’s really a group of tuples that make a set. In the simplest case, those tuples only have a single member.
To summarize members, tuples, and sets:
The rows and the columns of an OLAP spreadsheet are always defined
by a set of tuples.
Each row or column is defined by a single tuple.
Each tuple is defined by a single member from one or more
dimensions. In a simple case, when only a single dimension is used
for a row or column, the tuple has only a single member.
(NOTE: You don’t really have to understand the concepts of tuples and sets when you’re first starting with OLAP, but they’re very important when creating calculations, so you might as well start thinking about them. If you don’t see the point of these concepts at first, don’t worry. But if you start hearing about them now, you’ll have an easier time when creating calculations a little later.)
The Numbers Are Called Measures (or Facts) – The numbers in the OLAP spreadsheet are called measures. When setting up OLAP cubes, these values are also often called facts. Typical measures or facts would be:
Sales Dollars
Sales Count
Profit
Hours of Work
The Numbers are Displayed in the Cells of a Cube – The spaces in the OLAP spreadsheet are called cells (like the cells in an Excel spreadsheet).
Calculated Measures - Applying Formulas to Multidimensional Data - Some measures are more interesting when you combine them with other measures or analyze them from the perspective of a particular dimension. With calculated measures you can view data across different time periods or calculate the percentage one particular product contributes to total sales.
Each Cell Has One (And Only One!) Current Member From Every Dimension – Every cell in a cube is defined by one member from every dimension in the cube. That one member is called the current member for that dimension. The current member for each dimension is determined as follows:
1. If the dimension is used on the columns or on the rows, the
current member is the member for that dimension used in the tuple
defining the column or row.
2. If the dimension is used for slicing, the current member is the
member used in the slice.
3. If the dimension is not used for columns, rows, or slicing, the
current member is the default member for the dimension. The default
member is often the All level member. When the All level member is
used, the practical effect is to ignore that particular dimension in
the display of the data.
Dimension Information Is Stored In Dimension Tables – The information used for OLAP dimensions is contained in relational database tables which are often called dimension tables. Dimension tables have the following types of fields:
1. Key fields, used to join the dimension tables to the fact
table (and to other dimension tables when snowflaking).
2. Level name fields, used to store the member names for the levels.
For example, the Time dimension table could have a field called
Month, which would have values such as January, February, March,
etc.
3. Level Order Key fields, used to store integer values used to
order the members of the levels (if necessary). For example, the
Time dimension table could have a field called Month Order Key,
which could have a value of 1 for January, 2 for February, 3 for
March, etc.
4. Member Property fields, used to store the member property
information. A Time dimension could have a field called Day Count,
which would store the number of days for each month.
Information for the Measures Is Stored in the Fact Table – The information used for the measures is contained in a relational database table which is called the fact table. Whenever you make a new cube in the Analysis Manager, the first question you’re asked is to identify the fact table. Fact tables have the following types of fields:
1. Key fields connecting the fact table to a dimension table for
each of the dimensions. Every fact must be connected to one
particular member from the lowest level of every one of the
dimensions in the cube.
2. Measure fields, containing the numeric values used for the
measures.
The Star Schema – A Multidimensional Data Structure in a Relational Database – The combination of a fact table with its dimension tables is called a star schema. The fact table is at the center of the star, while each dimension table represents a point of the star.
Snowflaking the Star Schema (The Snowflake Schema) – The basic star schema has a simple structure where all the information for each dimension is contained in a single table. Snowflaking is the process of dividing information for one dimension among two or more tables. The resulting data structure is often called a snowflake schema. (I actually prefer to call this structure a star schema that has some snowflaking, but I think my preference for this terminology is a minority opinion.)
Analysis Services comes with a sample database called FoodMart 2000. This OLAP database is based on a Microsoft Access database, which is located, by default, at C:\Program Files\Microsoft Analysis Services\Samples\foodmart 2000.mdb. Take a look at this database to see dimension tables and fact tables.
Right click on the Sales cube from the FoodMart 2000 database in the Analysis Manager and select Edit. Switch to the Schema tab. You will see a star schema that has snowflaking in the Product dimension.
MDX is the Querying Language for OLAP – A special language has been developed for OLAP. This language looks similar to SQL, but it has many unique features. MDX is used for several purposes:
1. To create calculated members – members which are not in the
data, but can be derived from the existing members and measures.
2. To retrieve data from a cube.
3. To create local cube files.
4. To create calculated sets.
5. To define security rules for accessing data.
6. To create actions
Unbalanced Hierarchy
(Parent-Child Dimension) - A Dimension With Levels That Can
Appear Or Disappear - Most dimensions in Analysis Services
have a fixed number of levels. But a parent-child dimension can
display levels that have an indefinite depth. This is useful for
hierarchical structures that are constantly changing, such as who's
supervising who in a company.
Ragged Hierarchy - A Dimension Where Some Parents Have Missing Children (But Do Have Grandchildren) - It's fine in the United States to show a hierarchy that has the Country level, the State level, and the City level. But what happens to countries that don't have states? A ragged hierarchy allows you to skip levels where there aren't any real values.
Actions - Moving Beyond OLAP Browsing To Doing Something - OLAP Browsing has often been a fairly passive activity, but Analysis Services actions allow you to accomplish things right from inside your cubes. An action could allow you to open a web page for a customer. You could find a product that needed ordering and order it on the spot. You could find a salesperson with a high level of sales and send a congratulatory e-mail. You can do it all with actions inside your cubes.
Local Cubes Are Files That Can Be Used For Disconnected Access to Cube Data – OLAP cubes are stored in an Analysis Server. Applications can browse these cubes by connecting to the Analysis Server. But cube data can also be put into local cube files, which can be used when users do not have a connection to the Analysis Server. Browsing a local cube file appears to the users to be the same thing as browsing a cube stored in an Analysis Server. I think local cubes are particularly useful in a pilot project, when everyone is not yet hooked up to an Analysis Server.
The OLAP 30
Those are my top 30 OLAP concepts. If you understand those terms, you’re a long way toward understanding how OLAP data is structured and used.
Finding the Measures, Dimensions, Levels, and Member Properties
The next step is finding measures, dimensions, and levels in a relational database. It’s easy to build cubes if you already have your data organized into a star schema, as is the case with the FoodMart 2000 database. If you have your data in a normal relational database, it’s not so obvious what should be used for measures, dimensions, levels, and member properties.
There are two ways of deciding on the structures that you are going to include in your cubes:
1. Deciding on OLAP structures by looking at the spreadsheets
that are currently being used.
2. Deciding on OLAP structures by looking at the elements that are
available in the source data.
Both strategies are important. If the current spreadsheets are being used, the information in them should probably be included in the OLAP cubes. But the source data may have additional information which has been ignored in existing reports. When you’re building an OLAP system, you have the opportunity to give your users more of the data than they’ve seen before.
Session #2 of this seminar teaches you
how to find OLAP structures by looking in a relational database.