Tag Archives: 5

Interview Question #5: Analysis Using Enterprise Manager

Question

If you want to analyze the average length of time users must wait for their documents or reports to process, as well as the number of errors that were received in Enterprise Manager, which area of analysis would enable you to track this information?

A.   Performance Analysis

B.   Operations Analysis

C.   Real-Time Analysis

D.   Project Analysis

E.   You cannot accomplish this with Enterprise Manager

Answer

B. Operations Analysis

The Operations Analysis folder in Enterprise Manager contains the following analysis areas, each with its own reports:

  • Concurrency analysis (including user/session analysis)
  • Data load
  • Delivery processing analysis
  • Inbox Message Analysis
  • Report processing analysis
  • Resource utilization analysis (including top consumers)

MicroStrategy Course Where You Will Learn About This Topic

MicroStrategy Administration: Application Management Course

An Introduction to Data Blending – Part 5 (Tableau’s Data Blending Architecture)

Readers:

In Part 4 of this series on data blending, we reviewed Tableau’s Data Blending Principles. We also reviewed an example of data blending in Jonathan Drummey’s Patient Falls Dashboard. [3]

Today, in Part 5 of this series, we will peel the onion a bit more and look at Tableau’s Data Blending Architecture.

Again, much of Parts 1 – 5 are based on a research paper written by Kristi Morton from The University of Washington (and others) [1].

You can learn more about Ms. Morton’s research as well as other resources used to create this blog post by referring to the References at the end of the blog post.

Best Regards,

Michael

Integrating Data in Tableau

In Part 5, we discuss in greater detail how data blending works. Then we discuss how a user builds visualizations using data blending using several large datasets involving airline statistics.

Data Blending Architecture

Part 5 - Figure 1

The data blending system, shown in Figure 1 above, takes as input the VizQL query workload generated by the user’s GUI actions and data source schemas, and automatically infers how to query the data sources remotely and combine their results on-the-fly. The system features a two-tier mediator-based architecture in which the VizQL query workload is analyzed and partitioned at runtime based on the corresponding data source fields being used. The primary mediator initiates this process by removing the visual encodings from the VizQL query workload to yield an abstract query. The abstract query is partitioned for further processing by the primary mediator and one or more secondary mediators. The primary mediator creates the mediated schema for the given query workload. It then federates the abstract queries to the primary data source as well as the secondary mediators and their respective data sources. The wrappers compile the abstract queries into concrete SQL or MDX queries and instantiate the semantic mappings between the data sources and the mediated schema for each query. The primary mediator joins all the result sets returned from all data sources to produce the mediated result set used by the rendering system. [1]

Part 5 - Figure 2

Post-aggregate Join

A visualization is organized by its discrete fields into pages, partitions, colors, etc., and like a GROUP BY clause in SQL, these grouping fields comprise the primary key of the visualization. In a blended visualization, the grouping fields from the primary data source become the primary key of the mediated schema. In Figure 2 above, these are shown as the dark-green fields in the primary data source, and the light green fields represent the aggregated data. Each secondary data source must contain at least one field that matches a visualization grouping field in order to blend into the mediated schema. The matching fields in a secondary data source comprise its join key, and fields appear in the GROUP BY clause issued by the secondary mediator wrappers. The aggregated data from the secondary data source, shown in light-purple, is then left-joined along its join key into the mediated result set.  Morton (et al) refer to this left-join of aggregated result sets as a post-aggregate join. [1]

Primary Key Cardinality

many mapping between the domain values of the primary key and those of the secondary join key, because the secondary join key is a subset of the primary key and contains only unique values in the aggregated secondary result set. Morton (et al) find that this approach is the most natural for augmenting a visualization with secondary data sources of uncertain value or quality, which is a common scenario for Tableau users.

Data blending supports many-to-one relationships between the primary and each secondary. This can occur when the secondary data source contains coarser-grained data than the mediated result set, as discussed in Part 3 of this series.

Since the join key in a secondary result set may match a subset of the blended result set primary key, portions of the secondary result set may be duplicated across repeated values in the mediated result set. This does not pose risk of double-counting measure values, becaused all aggregation is performed prior to the join. When a blended visualization uses multiple secondary data sources, each secondary join key may match any subset of the primary key. The primary mediator handles duplicating each secondary result set as needed to join with the mediated result set.

Finally, a secondary dimension which is not part of the join key (and thus not a grouping field in the secondary query) can still be used in the visualization. If it is functionally dependent on the join key, a secondary dimension can be used without affecting the result set cardinality. Tableau references this kind of non-grouping dimension using both MIN and MAX aggregations in the query issued to the secondary data source, which allows Tableau to determine if the dimension is functionally dependent on the join key. For each row in the secondary result set, if the two aggregated values are the same then the value is used as-is, reflecting the functional dependence on the grouping fields. If the aggregated values differ, Tableau represents the value using a special form of NULL called ManyValues. This is represented in the visualization as a ‘*’, but retains the behavior of NULL
when used in calculated fields or other computations. The visual feedback allows a user to distinguish this lack of data from the NULLs which occur due to missing or mismatched data.

Inferring Join Keys

Tableau uses very simple rules for automatically detecting candidate join keys:

  1. The secondary data source field name must match a field with the same name in the primary data source.
  2. The data types must match
  3. If they are date/time fields, they must represent the same granularity date bin in the date/time hierarchy, e.g. both are MONTH. A user can intervene to force a match either by providing field captions to rename fields within the Tableau data model, or by explicitly defining a link between fields using a simple user interface.

Part 5 - Figure 3

Another Simple Blending Example

A Tableau data blending scenario is shown in Figure 3 above, which includes multiple views that were composed in minutes by uniquely mashing up four different airline datasets, the largest of which include a 324 million row ticket pricing database and a 140 million row on-time performance database. A user starts by dragging fields from any dataset on to a blank visual canvas, iteratively building a VizQL statement which ultimately produces a visualization. In this example, the user first drags the VizQL fields, YEAR(Flight Date) and AVG(Airfare), from the pricing dataset onto the visual canvas.

Data blending occurs when the user adds fields from a separate dataset to an existing VizQL statement in order to augment their analysis. Tableau assigns the existing dataset to the primary mediator and uses secondary mediators to manage each subsequent dataset added to the VizQL. The mediated schema has a primary key composed of the grouping VizQL fields from the primary dataset (e.g. YEAR(Flight Date)); the remaining fields in the mediated schema are the aggregated VizQL fields from the primary dataset along with the VizQL fields from each secondary dataset.

Continuing our example, the user wishes to drag AVG(Total Cost per Gallon) from the fuel cost dataset to the visualization. The schema matching algorithm examines
the secondary dataset for one or more fields whose name exactly matches a field in the primary key of the mediated schema. While the proposed matches are often sufficient and acceptable, the user can specify an override. Since the fuel cost dataset has a field named Date, the user provides a caption of Flight Date to resolve the schema discrepancy. At this point the mediated schema is created and the VizQL workload is then federated to the wrappers for each dataset. Each wrapper compiles VizQL to SQL or MDX for the given workload, executes the query, and maps the result set into the intermediate form expected by the primary mediator.

The mapping is performed dynamically, since both the VizQL and the data model evolve during a user’s iterative analytical workflow. Finally, the primary mediator
performs a left-join of each secondary result set along the primary key of the mediated schema. In this example, the mediated result set is rendered to produce the visualization shown in Figure 3(a).

Evolved Blending Example

Figure 3(b) above shows further evolution of the analysis of airline datasets, and demonstrates several key points of data blending. First, the user adds a unique ID field named unique carrier from the primary dataset to the VizQL to visualize results for each airline ID over time. The mediated schema adapts by adding this field to its primary key, and the secondary mediator automatically queries the fuel cost dataset at this finer granularity since it too has a field named uniquecarrier. Next, the user decorates the visualization with descriptive airline names for each airline ID by dragging a field named Carrier Name from a lookup table.

This dataset is at a coarser granularity than the existing mediated schema, since it does not represent changes to the carrier name over time. Morton’s (et al) system automatically handles this challenge by allowing the left-join to use a subset of the mediated result set primary key, and replicating the carrier name across the mediated result set. Figure 4 below demonstrates this effect using a tabular view of a portion of the mediated result set, along with portions of the primary and secondary result sets.

The figure also demonstrates how the left-join preserves data for years which have no fuel cost records. Last, the user adds average airline delays from a 140 million row dataset which matches on Flight Date and uniquecarrier. This is a fast operation, since the wrapper performs mapping operations on the relatively small, aggregated result set produced by the remote database. Note that none of these additional analytical tasks required the user to intervene in data integration tasks, allowing their focus to remain on finding insight in the data.

Part 5 - Figure 4

Filtering

Tableau provides several options for filtering data. Data may be filtered based on aggregate conditions, such as excluding including airlines having a low total count of flights. A user can filter aggregate data from the primary and secondary data sources in this fashion, which results in rows being removed from the mediated result set. In contrast, row level filters are only allowed for the primary data source. To improve performance of queries sent to the secondary data sources, Tableau will filter the join keys to exclude values which are not present in the domain of the primary data source result set, since these values would be discarded by the left-join.

Data Cleaning Capabilities

As mentioned in the Inferring Join Keys section above, Tableau supports user intervention in resolving field names when schema matching fails. And once the schemas match and data is blended, the visualization can help provide feedback regarding the validity of the underlying data values and domains. If there are any data inconsistencies, users can provide aliases for a field’s data values which will override the original values in any query results involving that field. The primary mediator performs a left-join using the aliases of the data values, allowing users to blend data despite discrepancies from data entry errors and spelling variations. Tableau provides a simple user interface for editing field aliases. Calculated fields are another aspect of Tableau’s data model which support data cleaning. Calculated fields support arbitrary transformations of original data values into new data values, such as trimming whitespace from a string or constructing a date from an epoch-based integer timestamp.

As with database fields, calculated fields can be used as primary keys or join keys.

Finally, Tableau allows users to organize a field’s related data values into groups. These ad-hoc groups can be used for entity resolution, such as binding multiple variations of business names to a canonical form. Ad-hoc groups also allow constructing coarser-grained structures, such as grouping states into regions. Data blending supports joins between two ad-hoc groups, as well as joins between an ad-hoc group and a string field.

 Next: Data Blending Using MicroStrategy

———————————————————————————-

References:

[1] Kristi Morton, Ross Bunker, Jock Mackinlay, Robert Morton, and Chris Stolte, Dynamic Workload Driven Data Integration in Tableau, University of Washington and Tableau Software, Seattle, Washington, March 2012, http://homes.cs.washington.edu/~kmorton/modi221-mortonA.pdf.

[2] Hans Rosling, Wealth & Health of Nations, Gapminder.org, http://www.gapminder.org/world/.

[3] Jonathan Drummey, Tableau Data Blending, Sparse Data, Multiple Levels of Granularity, and Improvements in Version 8, Drawing with Numbers, March 11, 2013, http://drawingwithnumbers.artisart.org/tableau-data-blending-sparse-data-multiple-levels-of-granularity-and-improvements-in-version-8/.

 

Tips & Tricks #5: Designing MicroStrategy Documents for Exporting to Excel

Overview

The following best practices will help ensure that your document is displayed correctly when it is exported to Microsoft Excel. When you export to Excel, the resulting spreadsheet looks like a PDF of the document. For example, objects have the same position and size in Excel that they do in a PDF.

For steps to export a document to Excel, see the section Exporting a document in MicroStrategy Web later in this blog entry.

When designing a document that might be exported to Excel, do the following to ensure that the document is displayed correctly in Microsoft Excel:

  • Know how different object types are exported and displayed in Excel, as shown in the following table:

Tip5-1

  • Choose Excel-compatible colors for all objects, including panels, shapes, and Grid/Graphs. Use the set of 40 colors that appear in the Color dialog box in the Document Editor. Excel supports these 40 colors in addition to many more. Other colors are matched by Microsoft Excel as closely as possible. Avoid using gradient colors, since they are not exported to Excel.

Tip 5-2

1 Excel Fill Colors

Tip5-3

2 Excel Font Colors

  • Use graph styles that are supported by Microsoft Excel. For example, if you include a Gauge graph in the document, it is not displayed in Excel. If you include a Combination graph, the exported version in Excel may not be displayed exactly like the original graph in MicroStrategy Web.
  • Avoid overlapping objects. When exported, the document may not be displayed correctly. For example, an object in the background of the document may be displayed in the foreground of the Excel spreadsheet.
  • Provide extra space around objects because they may increase in size when the document is exported to Excel.
  • Use text field borders to create lines and rectangles. Standard MicroStrategy line and rectangle controls may not be displayed correctly in Excel. You can also use a panel stack to create a colored background; for instructions to create panel stacks, see the Dashboard Creation Guide.

Tip5-4

3 Excel Border (Line) Colors and Weights

Avoid inserting line breaks within text fields. Line breaks (inserted by typing CTRL+ENTER) are not rendered in Excel.

  • Do not enable word-wrapping in a column header on a Grid/Graph in MicroStrategy. If you do so, the headers are not displayed correctly in Excel or PDF. Enable word-wrapping in Excel after you export the document.

Tip5-5

4 Turning off Word Wrap in MicroStrategy

Tip5-6

5 Turning off Word Wrap in Excel

  • Use an absolute file path to define the location of an image used in a document. Do not use a relative file path. Images in documents specified with paths relative to MicroStrategy Web and Intelligence Server are not displayed when exported to Excel.

Exporting a document in MicroStrategy Web

Exporting a document allows you view and interact with the document results outside of MicroStrategy Web. For example, you can view results in a PDF file within Adobe Reader, manipulate exported data in an Excel spreadsheet, or create an interactive Flash dashboard for off-line use.

You can export a document to the following formats:

  • PDF file: Export the document to a PDF file in a reader, such as Adobe Reader. You can view the PDF on any device with a PDF reader, such as another computer, a Linux machine, a Nook, or a Kindle.
  • Excel spreadsheet (.xls): Export the document to a Microsoft Excel spreadsheet for further manipulation and use. See the Document Creation Guide for tips to create a document that will display correctly when it is exported to Excel.
  • HTML file: Export the document to an html file in a separate window.
  • Flash file: Export the document to a fully interactive, stand-alone Flash file. All the Flash files in a project are exported in one of the following formats:
  • MHT format, which can be opened in Internet Explorer, and in Firefox with a third-party plug-in

The document designer determines the Flash file format.

You are prompted for the following when you export a document, unless the document designer has specified the default exporting options for the document:

  • If your document is grouped, you can choose to export the entire document or only the selected group element. Page-by allows you to view the document by a selected group element.
  • If your document contains multiple layouts, you can choose to export the entire document or only the layout currently being viewed by the user.
  • You can export a document from within the document, or directly from a folder. The following procedures describe both tasks. You can also export a single Grid/Graph to a PDF file or an Excel spreadsheet.

To export an open document

  1. In MicroStrategy Web, click the name of the document to execute it.
  2. From the Home menu, select Export, then select the file format in which to export the document. The options are:
    • Excel
    • HTML
    • PDF
    • Flash

       If an export format is not available on the toolbar, that export format has not been made available for this document.

3. Depending on the file type, you may be prompted to open or save the exported file. You can choose to either:

  • Open the file, by clicking Open. A copy of the document opens in a browser.
  • Save the file, by clicking Save. Name and save the file. Do not change the file type.

To export a document directly from a folder

  1. In MicroStrategy Web, right-click the document and select Run as.
  2. If the document contains prompts, answer them and click Export.
  3. If DHTML is disabled, click the Export icon or PDF icon in the Actions column.

*        If an export format is not available on the toolbar, that export format has not been made available for this document.

4. If DHTML is enabled, select the format to export to:

    • To export to a PDF, select Export to PDF.
    • To export to Microsoft Excel, select Export to Excel.
    • To export to HTML, select Export to HTML.
    • To export to Flash, select Export to Flash.

5. Depending on the file type, you may be prompted to open or save the exported file. You can choose to either:

    • Open the file, by clicking Open. A copy of the document opens in a browser.
    • Save the file, by clicking Save. Name and save the file. Do not change the file type.

Exporting a Grid/Graph from a document

You can export a single Grid/Graph displayed in a document to either a PDF file or an Excel spreadsheet. This provides additional flexibility to share and print your data, so that you do not have to export the entire document.

The document must be displayed in Express or Flash Mode. In Express Mode, the title bar of the Grid/Graph must be displayed. The following procedures describe both tasks.

*         A widget is exported as a Grid/Graph.

To export a Grid/Graph from Express Mode

  1. In MicroStrategy Web, open the document in Express Mode.
  2. Click the icon on the title bar of the Grid/Graph and point to either Export to PDF or Export to Excel.
  3. If you choose PDF, the PDF file opens in another browser window.
  4. If you choose Excel, you are prompted to open or save the exported file. You can choose to either:
  5. Open the file, by clicking Open. A copy of the document opens in another browser window.
  6. Save the file, by clicking Save. Name and save the file. Do not change the file type.
  7. In MicroStrategy Web, open the document in Flash Mode.
  8. Hover your cursor over the Grid/Graph. Icons for exporting to Excel and to PDF are displayed.
  9. Do one of the following:

To export a Grid/Graph from Flash Mode

  • To export the Grid/Graph to Excel, click the Export to Excel icon. You are prompted to open or save the exported file. You can choose to either:
  • Open the file, by clicking Open. A copy of the document opens in another browser window.
  • Save the file, by clicking Save. Name and save the file. Do not change the file type.
  • To export the Grid/Graph to PDF, click the Export to PDF icon. The PDF file opens in another browser window.