Tag Archives: to

Jacob Gube: 6 Ways to Increase the Visual Weight of Something

Readers:

Jason GubeWhile purusing through Zite, I came across this blog post on the Design Instruct web site by Jacob Gube. Jacob is the co-founder and a managing editor of Design Instruct. He’s a web developer, and also the owner of Six Revisions. Follow Jacob on Twitter: @sixrevisions.

Best Regards,

Michael

6 Ways to Increase the Visual Weight of Something

In a design composition, the visual weight of an object refers to how well it draws attention to itself compared to other components of the composition. The “heavier” the object is, the more eye-grabbing it is.

When creating a design, it’s a good idea to prioritize key elements in the visual space by giving them heavier visual weights. For example, things you might consider giving heavier visual weights to — so that they’re more easily seen by the viewer — are call-to-action buttons in a web design, or the subject of a photograph.

I’ll talk about a few tricks for increasing the visual weight of an object.

1. Give It a Different Color

When the color-contrast between an object and its surroundings (including its background) is high, the more able it is to garner our attention.

In the example above, notice how, even though the size, shape and margins of the stars are identical, the red star is able to get your attention simply because of how distinctive its color is compared to other elements in the composition.

2. Move It Away from Other Objects

One easy trick for increasing the visual weight of an object is distancing it from other objects. Adding plenty of negative space around the object separates it from other objects, which in turn makes the object stand out.

In the example above, look at how our eyes interpret the composition as two groups of rabbits: A big group of 12 rabbits and a small group consisting of only one rabbit. By being farther away from the others, the estranged rabbit is able to command our attention more than any other rabbit in the composition.

3. Make It Look Different

When things look alike, it’s naturally hard for us to differentiate them. So, quite simply, we can make the visual weight of an object heavier by making it look different from other objects.

Even a slight change in the style properties of an object can heavily influence its visual weight if objects in the composition look similar. In the above example, notice how the circle at the center of the first row is able to get our eyes’ attention compared to the other circles.

4. Point to It

A simple trick for increasing the visual weight of something is to direct the viewer’s eyes to it using visual queues such as arrows.

In the above example, check out how the visual weight of the house is increased because it’s surrounded by arrows that point to its location. No matter where our attention goes, we’re redirected to look at the house because of the arrows.

5. Make It Look Visually Complex

An ornate object attracts our eyes more when it’s set among simple and unadorned objects. We can make the appearance of an object complex by giving it textures, drop shadows, changing its shape, adding more color to it, and so forth.

In the example above, the multi-colored circle has the heaviest visual weight because the surrounding objects are styled plainly.

6. Make It Bigger

Making an object larger than the other objects around it will increase its visual weight. It’s a reasonable proposition: The more visual space an object takes up, the more visible it is.

In the example above, notice how our eyes are quickly drawn to the biggest heart . The only thing different with it is its size.

Visual weight is a simple but incredibly powerful design tool for strategically arranging elements so that more important elements are readily seen by our viewers.

What tricks do you use to increase the visual weight of an object? Share your advice in the comments.

Using the R Integration functionality, how to perform Text Mining on a MicroStrategy report and display the result

Readers:Jaime Perez

Here is another great post in the MicroStrategy Community from Jaime Perez (photo, right) and his team. A lot of work when into the preparation of this post and it shows some great ways to use the “R” integration with MicroStrategy.

Contributors from Jaime’s team include:

ssonobe  ssonobe

Lili 

Joanne A 

Ohingst  Ohingst

Enjoy!

Michael

Text Mining Using R Integration in MicroStrategy

Users may wish to perform text mining using R on the result of any arbitrary MicroStrategy report and display the result. One of the problems that hinders the users from achieving it is that the number of output elements is not always consistent. For example, a report may have three attributes named ‘Age groups’, ‘Reviewer’, and ‘Survey feedback’ and the report might display four rows of feedback as follows:

01.jpg

If the above report result is sent to R as an input and the R script breaks down each sentence of the feedback into the term frequency that is grouped by the age groups, it will have 18 rows.

02.jpg

Since the number of output elements is greater than the number of the MicroStrategy report rows, the report execution will fail. Using the objects in the Tutorial project, this technical note (TN207734) describes one way to display the result of text mining on a MicroStrategy report, using the R integration functionality.

PREMISE:
– Following the instructions in TN43665, the MicroStrategy R Integration Pack has already been installed on the Intelligence Server.

The Steps Involved

STEP 1: Decide on the input values that need to be sent to R via R metrics
The first step is to decide on which data you wish to perform text mining. In this technical note, the sample report will let users select one year element, the arbitrary number of category elements, and specify the Revenue amount in prompts. The report will then display the value of the normalized TF-IDF (term frequency and inverse document frequency) for every word showing up in the qualified Item attribute elements, grouped by the Category elements.

A user may select the following values for each prompt and the report may look as shown below.

  • Year: 2012
  • Category: Books, Movies, and Music
  • Revenue: greater than $15,000

03.jpg

Eventually, the user may want to see the normalized TF-IDF for every word showing up in the Item attribute elements as shown below:

04.jpg

Since the final output displays each word from the Item attribute and it is grouped by the Category elements, the necessary input values to R are as follows

  • The elements of the Category attribute.
  • The elements of the Item attribute.

 

STEP 2: Create metrics to pass the input values to R

The input values to R from MicroStrategy must be passed via metrics. Hence, on top of the current grid objects, additional metrics need to be created. For this sample report, since the inputs are the elements of two attributes, create two metrics with the following definitions so that the elements are displayed as metrics.

Max(Category@DESC) {~}

Max(Item@DESC) {~}

05.jpg

 

STEP 3: R script – Phase 1: Define input and output variables and write R script to obtain what you wish to display in a MicroStrategy report

In the R script, define (1) a variable that receives the inputs from MicroStrategy and (2) a variable that will be sent back to MicroStrategy as the output as depicted below. Since the number of output elements must match with the number of input elements, it is defined as “output = mstrInput2” to avoid the errors. In other words, this script executes R functions to obtain the data that you wish to display in a MicroStrategy report, but the output is the same as the input. More details about how to display the result in a MicroStrategy report will be followed up later in this technical note.

06.jpg

 

In this technical note, after manipulating the input value, we assume that the variable named ‘norm.TF.IDF’ in the R script holds the values of the TF-IDF for each term.

07.jpg

 

STEP 4: Create tables in the data warehouse to store the value of your R output

In order to display the values of the ‘norm.TF.IDF’ defined in a MicroStrategy report, tables to hold the result need to be created in the data warehouse. In other words, additional report will later have to be created in MicroStrategy and it will extract the data from the database tables, which are created in this section.

In this specific example, the variable ‘norm.TF.IDF’ has the elements of words (terms) and categories and the values of the normalized TF-IDF. Considering the types of data, the first two should be displayed as attributes and the values of the normalized TF-IDF should be presented in a metric. Hence, two lookup tables to hold the term and category elements and one fact table need to be created to store all the data. On top of these tables, one relationship table is also required since the relationship between words and categories is many-to-many.

 

STEP 5: R script – Phase 2: Populate the tables in your R script

As previously mentioned, the variable named ‘norm.TF.IDF’ contains the values, which a user wishes to display in a MicroStrategy report as shown below.

07.jpg

 

In this R script, four more variables are defined from ‘norm.TF.IDF’, each of which contains the subset of data that will be inserted into the database tables.

 

tm_Category holds the unique elements of the Category.

10.jpg

 

tm_Word holds the unique elements of the Word (Term).

11.jpg

 

tm_Word_Cat stores the values of the many-to-many relationship.

12.jpg

 

tm_Fact contains the values of TF-IDF for every Word-Category combination.

13.jpg

 

In the R script, populate the database tables with the above four subsets of ‘norm.TF.IDF’.

# Load RODBC
library(RODBC)

# RODBC package: assign ch the connectivity information
ch <- odbcConnect("DSN_name")

# Delete all the rows of the tables
sqlClear(ch, "tm_Category", errors = TRUE)
sqlClear(ch, "tm_Word",     errors = TRUE)
sqlClear(ch, "tm_Word_Cat", errors = TRUE)
sqlClear(ch, "tm_Fact",     errors = TRUE)

# SQL: insert the data into tables; use parameterized query
sqlSave(ch, tm_Category, tablename = "tm_Category", rownames=FALSE, append=TRUE, fast = TRUE)
sqlSave(ch, tm_Word,  tablename = "tm_Word", rownames=FALSE, append=TRUE, fast = TRUE)
sqlSave(ch, tm_Word_Cat, tablename = "tm_Word_Cat", rownames=FALSE, append=TRUE, fast = TRUE)
sqlSave(ch, tm_Fact, tablename = "tm_Fact", rownames=FALSE, append=TRUE, fast = TRUE)

#Close the channel
odbcClose(ch)

 

STEP 6: Create and add an R metric, which implements the R script

The R script is done. It is time to implement this R script from MicroStrategy by creating an R script. In the deployR interface, open the R script and define the input and output that you specify in Step 3 as follows. Since the elements of the Category and Item attributes are characters, choose “String” as its data type. Likewise, since the output is the same as the mstrInput2, its data type is also set to string.

14.jpg

 

Create a stand-alone metric and paste the metric definition of the deployR utility. Then, replace the last parameters by the Category and Item metrics that you created in Step 2.

15.jpg

 

Add the R metric to the report.

15.2.png

 

The report and R will perform the following actions after adding the R metric
i. The report lets users select the prompt answers
ii. MicroStrategy sends the Category and Item elements to R via the R metric
iii. R performs text mining to calculate the TF-IDF based on the inputs
iv. R generates subsets of the TF-IDF
v. R truncates the database tables and populates them with the subset of the TF-IDF
vi. R sends the output(which is actuary the input) to MicroStrategy
vii. The report displays the values of all object including the R metric

 

STEP 7: Create MicroStrategy objects to display the data

From the tables created in Step 4, create the Word and Category attributes and the fact named weight. The object relationship is as depicted below.

08.jpg

09.jpg

 

Now, create a new report with these objects. This report will obtain and display the data from the database tables.

16.jpg

 

STEP 8: Utilize the report level VLDB properties to manipulate the order of the report execution jobs

There are currently two reports and let each of which to be named R1 and R2 as described below

  • R1: A report which prompts users to specify the report requirements and implements the R script executing text mining
  • R2: This report obtains the result of text mining from the database and display it

 

If the two reports are placed in a document as datasets as shown below, there is one problem: R2 may start its execution before R1 populates the database tables with the result of text mining.

17.jpg

 

In order to force R2 to execute its job after the completion of R1, the VLDB properties PRE/POST statements along with additional database table may be used. The table tm_Flag contains the value of 0 or 1. R2 is triggered when R1 sets the value of completeFlag to 1. The detailed steps are described below with the script for SQL Server.

 

i. Create another table in the database, which holds the value of 1 or 0

CREATE TABLE tm_Flag
(
   completeFlag int
)


INSERT INTO tm_Flag VALUES(0)

 

ii. In the VLDB property ‘Report Post Statement 1” of the R1 report, defines a Transact-SQL statement that changes the value of completeFlag to the value of 1.

DECLARE @query as nvarchar(100)
SET @query = 'UPDATE tm_Flag SET completeFlag = 1'
EXEC sp_executesql @query

 

iii. Define the VLDB property ‘Report Pre Statement 1’ in R2 so that it will check the value of completeFlag every second and loop until it turns to 1. After the loop, it will revert the value of completeFlag back to 0. After this Report Pre Statement, R2 will obtain data from the database, which has been populated by R1.

DECLARE @intFlag INT
SET @intFlag = (select max(completeFlag) from tm_Flag)

WHILE(@intFlag = 0)
BEGIN
	WAITFOR DELAY '00:00:01'
	SET @intFlag = (select max(completeFlag) from tm_Flag)
END

DECLARE @query as nvarchar(100)
SET @query = 'UPDATE tm_Flag SET completeFlag = 0'
EXEC sp_executesql @query

 

Activity Diagram

18_revised.png

 

 

Overall execution flow

  1. Answer prompts

19.png

 

2. Only the text mining result is displayed to users

20.png

 

Third Party Software Installation:

WARNING: The third-party product(s) discussed in this technical note is manufactured by vendors independent of MicroStrategy. MicroStrategy makes no warranty, express, implied or otherwise, regarding this product, including its performance or reliability.

 

Tips & Tricks #12: How to Troubleshoot Cross Joins in SQL Reports for the SQL Generation Engine 9.x

MicroStrategy Community Banner

Readers:

In my last blog post, I blogged about the new MicroStrategy Community. Jaime Perez, VP of Worldwide Customer Services, and his crew have come up with a better way for us to engage with MicroStrategy as well as his team.

Speaking of Jaime, last June, he posted this great tip on the MicroStrategy Knowledgebase site as a TechNote. I am reblogging it since it is one of the most frequent questions I get asked and I find it an extremely useful Tip & Trick. Also, this will give you an idea of the great stuff being posted in the MicroStrategy Community.

Best Regards,

Michael

MicroStrategy and Cross Joins

In some scenarios, one may encounter cross joins in the SQL View of a standard, SQL Report in MicroStrategy.  Cross joins appear when two tables do not have any common key attributes between them in which they can inner join.  As a result, the two tables essentially combine together to create one table that has all the data from both tables, but this results in poorer performance with a common effect of increased execution times.  Sometimes these execution times, and performance hits, can be very severe.  Therefore, it is important to understand some simple steps that can be performed to resolve a cross join, as well as some steps to understand why it may be appearing in the SQL View of the report.

One common occurrence of a cross join is when a report contains at least two unrelated attributes in the grid, and no metrics are present in order to relate the unrelated attributes via a fact table.  Such a occurrence can be resolved in a few ways:

  1. Create a relationship filter, set the output level as the unrelated attributes (or the entire report level), and then relate these by a Logical Table object
  2. Create a relationship filter, set the output level as the unrelated attributes (or the entire report level), and then relate these by a Fact object
  3. Add a metric to the report that uses a fact from a table in which both attributes can inner join to

This provides a pathway from the fact table to the lookup tables in which the unrelated attributes are sourced from.  The result is an inner join between the fact table and the lookup tables, which resolves the cross join between the two unrelated lookup tables.

Options 1 and 2 provide a means in which the report template can remain as only attributes, whereas Option 3 would have a metric on the report.  Option 3 may not be desired if a metric does not want to be placed on the report.  Keep in mind that other techniques can also be employed to have the metric on the report, but formatted to be hidden from display.

More common scenarios include cross joins between a fact table and a lookup table, and are typically surprising to a developer.  These situations can be a bit more tricky to troubleshoot and resolve, but here are a few techniques that can be employed to try to resolve the issue:

  1. In SQL View look at where the cross join appears, and between which tables the cross join appears
  2. Open up those tables in the Table Editor by navigating to the Schema Objects\Tables folder, and double-clicking the tables
  3. Select the Logical View Tab of both tables to see all the logical objects mapped to the table
  4. Take note of which attributes have a key icon beside them
  5. These key attributes denote attributes at the lowest level of their hierarchy presently mapped to the table and/or attributes that are in their own hierarchy (meaning they have no parents or children)
  6. The SQL Engine will join 2 tables on common key attributes only, so if none of the key attributes on either table exist on both tables, then a cross join should appear

This means that just because a Region attribute exists on Table_A and a Region attribute exists on Table_B does not necessarily mean that the SQL Engine will join on Region.  If Region has its child attribute on the table, then that attribute should be the key as it is the lowest level attribute of its particular hierarchy mapped to the table.  If Region exists on both tables, and is also a key attribute on both tables, then an inner join should take place on Region.

This essentially means that one can find a cross join, investigate the tables in which it appears, and verify if at least 1 common key attribute exists between the tables.  If not, then that should be the first path to investigate because a cross join is correct in that scenario.

Video

You can find a detailed video on how this issue is reproduced and resolved here: Tech Note 71019 . Steps to reproduce and resolve

Note

MicroStrategy Technical Support can assist with resolving cross joins in a specific report, however caution should be taken when resolving such issues.  In some scenarios, the cross join is resolved through modifications to the schema objects, which can have a ripple effect to all other reports in an environment.  For example, if a relationship is changed in the Region attribute to resolve a cross join in one report, this change will be reflected in all other reports that use Region, and potentially the hierarchy in which Region belongs.  As a result, the SQL View of one report will have the cross join resolved, but the SQL may have changed in other reports using Region or its related attributes.  This may or may not be desired.  MicroStrategy Technical Support may not be able to fully understand the impact of such a schema change to the data model, so before a change is made to the data model the consequences of such a change should be fully understood by the developer, and any changes made to the schema should be recorded.

References

[1] Jaime Perez, TN47356: How to troubleshoot cross joins in SQL Reports for the SQL Generation Engine 9.x, MicroStrategy Community, 06/24/2014, http://community.microstrategy.com/t5/Architect/TN47356-How-to-troubleshoot-cross-joins-in-SQL-Reports-for-the/ta-p/196989.

[2] MicroStrrategy Knowledgebase, Tech Note 71019 . Steps to reproduce and resolve,

MicroStrategy Simplifies Product Packaging to Enhance Total Customer Experience

New MicroStrategy Pricing

Converges on Four Products; Offers Free Upgrades to Premium Capabilities for Existing Clients

BARCELONA, Spain, July 8, 2014 – MicroStrategy® Incorporated (Nasdaq: MSTR), a leading worldwide provider of enterprise software platforms, today announced a new packaging structure aimed at delivering the best end-to-end customer and partner experience, making it easier than ever to acquire, deploy, and succeed with MicroStrategy. MicroStrategy also announced that it is extending free upgrades for existing clients to the premium capabilities included in the new product packaging, offering greater value to clients and new users.

The packaging changes will empower new and existing MicroStrategy clients to realize the full potential of their analytical applications using the most comprehensive analytics and mobile platforms in the industry. This information, and more, can be found at: www.microstrategy.com/experience.

“Our new packaging makes it simple for organizations to choose MicroStrategy for the totality of their business analytics and mobile application needs,” said Paul Zolfaghari, President, MicroStrategy Incorporated. “We believe it instantly enhances our value to existing customers and is emblematic of our heightened focus on delivering a positive customer experience. The new packaging allows better preparation and planning for new deployments, providing more value over the broad range of solutions we offer.”

Under the new packaging structure, MicroStrategy’s full feature set, previously split into 21 discrete offerings, has been reduced to four simple packages that empower developers, analysts, power users, and consumers to take advantage of the comprehensive MicroStrategy platform with simplified value-based pricing.

“From a customer perspective this is a welcome change,” said Andrew Young, BI Director, at Bob Evans Farms, a MicroStrategy client. “Budgeting and planning new applications will be far easier, especially breaking down platform investments to our business customers. With the simplified offering and pricing structure we can paint a more complete picture and focus on the business value.”

MicroStrategy added that the new packaging allows clients to more affordably deploy the full breadth of MicroStrategy capabilities (including data federation, write-back, closed-loop analysis, and automated report distribution, among others) to more users across the enterprise, giving end users full authoring capabilities as needed and integration with Microsoft® Office® applications. System architects gain efficiencies with the full ability to manage upgrades, migrations, and data loads, as well as free server administration and monitoring features. Within the four product offerings, clients will have all styles of analytics (self-service, dashboards, advanced analytics) across any interface (web, mobile, pdf, email report distribution) at Big Data scale—on an automated platform.

“To maximize the value of a customer relationship requires that companies simplify the pricing to ensure the purchasing process of technology is easy and transparent,” said Mark Smith, CEO and Chief Research Officer at Ventana Research. “The new MicroStrategy packaging and pricing enable the best possible customer experience, while shortening the time to gain full value from technology for organizations.”

The new packaging focuses on user roles within an enterprise:

  • MicroStrategy Server™ benefits all user roles. It includes a fully-featured server infrastructure designed to connect to multiple data sources, supports all major analytic styles from report distribution to information-driven apps to self-service data discovery, and scales to hundreds of thousands of users. It also includes administration and monitoring tools needed by organizations to effectively and efficiently manage their deployments.
  • MicroStrategy Web™ empowers business users to consume, author, and design analytics through an intuitive web-based interface. Business analysts can use MicroStrategy Web to take advantage of the all-inclusive set of self-service analytic capabilities.
  • MicroStrategy Mobile™, the award-winning, industry-leading interface for Apple iOS and Android devices, is an easy, fast, affordable way to mobilize analytics and information-driven applications to an increasingly mobile and 24 x 7 workforce.
  • MicroStrategy Architect™ provides developers with an extensive set of development, deployment and migration tools needed to efficiently manage the application development lifecycle.

The Company also noted that these four offerings complement the free MicroStrategy Analytics Desktop™ it made available last year. MicroStrategy Analytics Desktop is a free self-service business analytics tool designed to enable any individual user to gain deep insight into the user’s data by effortlessly creating powerful, insightful visualizations and dashboards.

An Introduction to Data Blending – Part 6 (Data Blending using MicroStrategy)

Readers:

In Part 5 of this series on data blending, we reviewed Tableau’s Data Blending Architecture. With Part 5, I have wrapped up the Tableau portion of this series.

I am now going to post, over the next week or so, several parts discussing how we do data blending using MicroStrategy. Fortunately, MicroStrategy just publish a nice technical note on their Knowledgebase (TN Key: 46940) [1] discussing this. Most of what I am sharing today is derived from that technical note.

I probably will have 2-4 parts for this topic in my Data Blending series including how the MicroStrategy Analytical Engine deals with multiple datasets.

I want to thank Kristi Morton (et al) for the wonderful research paper she wrote at The University of Washington [2]. It helped me provide some real insight into the topic and mechanics of data blending, particularly with Tableau. You can learn more about Ms. Morton’s research as well as other resources used to create this blog post by referring to the References at the end of the blog post.

So, let’s now dig into how MicroStrategy provides us data blending capabilities.

Best Regards,

Michael

Data Blending using MicroStrategy

In Part 6, we will begin examining using data blending in MicroStrategy. We will first look at how to use attributes from multiple datasets in the same Visual Insight dashboard and link them to existing attributes using the Data Blend feature in MicroStrategy Analytics Enterprise Web 9.4.1.

Prior to v9.4.1 of MicroStrategy, data blending was referred to as Cube Joining.

In MicroStrategy Analytics Enterprise Web 9.4.1, the new Report Services Documents Engine automatically links common attributes using the modeled schema whenever possible. The manual linking is not allowed between different modeled attributes. Just in case the requirement needs to link different attributes, this can be done by using MicroStrategy Architect at the schema level. The join behavior by default for linking related attributes is done using a full outer join. In case there is no relationship between the attributes, then a cross join is used.

The manual attribute linking can be done as shown in the images below.

Part 6 - 1

 

2. Browse the file to match the existing data and select Continue.

Part 6 - 2

 

3. Set the attribute forms if needed. MicroStrategy will automatically assign the detected ones.

Part 6 - 3

4. The attributes can be mapped manually by selecting Link to Project Attribute.

Part 6 - 4

5. Select the attribute form that matches the desired join:

Part 6 - 5

6. The attribute should appear similar to the ones existing in the schema as shown below.

Part 6 - 6

 

7. Save the recently created dataset.

Part 6 - 7

8. Now there are two cubes used as datasets in the same Visual Insight dashboard, as shown below.

Part 6 - 7a

Automatic Linking

The attributes icons now have a blue link, as shown below. This indicates that MicroStrategy has automatically linked them to elements in the Information dataset.

Part 6 - 8

Next: How Data Blending Affects the Analytical Engine’s Behavior in MicroStrategy

———————————————————————————-

References:

[1] MicroStrategy Knowledgebase, How to use attributes from multiple datasets in the same Visual Insight dashboard and link them to existing attributes using the Data Blend feature in MicroStrategy Analytics Enterprise Web 9.4.1, TN Key: 46940, 04/24/2014, https://resource.microstrategy.com/support/mainsearch.aspx.

NOTE: You may need to register to view MiroStrategy’s Knowledgebase.

[2] Kristi Morton, Ross Bunker, Jock Mackinlay, Robert Morton, and Chris Stolte, Dynamic Workload Driven Data Integration in Tableau, University of Washington and Tableau Software, Seattle, Washington, March 2012, http://homes.cs.washington.edu/~kmorton/modi221-mortonA.pdf.

An Introduction to Data Blending – Part 5 (Tableau’s Data Blending Architecture)

Readers:

In Part 4 of this series on data blending, we reviewed Tableau’s Data Blending Principles. We also reviewed an example of data blending in Jonathan Drummey’s Patient Falls Dashboard. [3]

Today, in Part 5 of this series, we will peel the onion a bit more and look at Tableau’s Data Blending Architecture.

Again, much of Parts 1 – 5 are based on a research paper written by Kristi Morton from The University of Washington (and others) [1].

You can learn more about Ms. Morton’s research as well as other resources used to create this blog post by referring to the References at the end of the blog post.

Best Regards,

Michael

Integrating Data in Tableau

In Part 5, we discuss in greater detail how data blending works. Then we discuss how a user builds visualizations using data blending using several large datasets involving airline statistics.

Data Blending Architecture

Part 5 - Figure 1

The data blending system, shown in Figure 1 above, takes as input the VizQL query workload generated by the user’s GUI actions and data source schemas, and automatically infers how to query the data sources remotely and combine their results on-the-fly. The system features a two-tier mediator-based architecture in which the VizQL query workload is analyzed and partitioned at runtime based on the corresponding data source fields being used. The primary mediator initiates this process by removing the visual encodings from the VizQL query workload to yield an abstract query. The abstract query is partitioned for further processing by the primary mediator and one or more secondary mediators. The primary mediator creates the mediated schema for the given query workload. It then federates the abstract queries to the primary data source as well as the secondary mediators and their respective data sources. The wrappers compile the abstract queries into concrete SQL or MDX queries and instantiate the semantic mappings between the data sources and the mediated schema for each query. The primary mediator joins all the result sets returned from all data sources to produce the mediated result set used by the rendering system. [1]

Part 5 - Figure 2

Post-aggregate Join

A visualization is organized by its discrete fields into pages, partitions, colors, etc., and like a GROUP BY clause in SQL, these grouping fields comprise the primary key of the visualization. In a blended visualization, the grouping fields from the primary data source become the primary key of the mediated schema. In Figure 2 above, these are shown as the dark-green fields in the primary data source, and the light green fields represent the aggregated data. Each secondary data source must contain at least one field that matches a visualization grouping field in order to blend into the mediated schema. The matching fields in a secondary data source comprise its join key, and fields appear in the GROUP BY clause issued by the secondary mediator wrappers. The aggregated data from the secondary data source, shown in light-purple, is then left-joined along its join key into the mediated result set.  Morton (et al) refer to this left-join of aggregated result sets as a post-aggregate join. [1]

Primary Key Cardinality

many mapping between the domain values of the primary key and those of the secondary join key, because the secondary join key is a subset of the primary key and contains only unique values in the aggregated secondary result set. Morton (et al) find that this approach is the most natural for augmenting a visualization with secondary data sources of uncertain value or quality, which is a common scenario for Tableau users.

Data blending supports many-to-one relationships between the primary and each secondary. This can occur when the secondary data source contains coarser-grained data than the mediated result set, as discussed in Part 3 of this series.

Since the join key in a secondary result set may match a subset of the blended result set primary key, portions of the secondary result set may be duplicated across repeated values in the mediated result set. This does not pose risk of double-counting measure values, becaused all aggregation is performed prior to the join. When a blended visualization uses multiple secondary data sources, each secondary join key may match any subset of the primary key. The primary mediator handles duplicating each secondary result set as needed to join with the mediated result set.

Finally, a secondary dimension which is not part of the join key (and thus not a grouping field in the secondary query) can still be used in the visualization. If it is functionally dependent on the join key, a secondary dimension can be used without affecting the result set cardinality. Tableau references this kind of non-grouping dimension using both MIN and MAX aggregations in the query issued to the secondary data source, which allows Tableau to determine if the dimension is functionally dependent on the join key. For each row in the secondary result set, if the two aggregated values are the same then the value is used as-is, reflecting the functional dependence on the grouping fields. If the aggregated values differ, Tableau represents the value using a special form of NULL called ManyValues. This is represented in the visualization as a ‘*’, but retains the behavior of NULL
when used in calculated fields or other computations. The visual feedback allows a user to distinguish this lack of data from the NULLs which occur due to missing or mismatched data.

Inferring Join Keys

Tableau uses very simple rules for automatically detecting candidate join keys:

  1. The secondary data source field name must match a field with the same name in the primary data source.
  2. The data types must match
  3. If they are date/time fields, they must represent the same granularity date bin in the date/time hierarchy, e.g. both are MONTH. A user can intervene to force a match either by providing field captions to rename fields within the Tableau data model, or by explicitly defining a link between fields using a simple user interface.

Part 5 - Figure 3

Another Simple Blending Example

A Tableau data blending scenario is shown in Figure 3 above, which includes multiple views that were composed in minutes by uniquely mashing up four different airline datasets, the largest of which include a 324 million row ticket pricing database and a 140 million row on-time performance database. A user starts by dragging fields from any dataset on to a blank visual canvas, iteratively building a VizQL statement which ultimately produces a visualization. In this example, the user first drags the VizQL fields, YEAR(Flight Date) and AVG(Airfare), from the pricing dataset onto the visual canvas.

Data blending occurs when the user adds fields from a separate dataset to an existing VizQL statement in order to augment their analysis. Tableau assigns the existing dataset to the primary mediator and uses secondary mediators to manage each subsequent dataset added to the VizQL. The mediated schema has a primary key composed of the grouping VizQL fields from the primary dataset (e.g. YEAR(Flight Date)); the remaining fields in the mediated schema are the aggregated VizQL fields from the primary dataset along with the VizQL fields from each secondary dataset.

Continuing our example, the user wishes to drag AVG(Total Cost per Gallon) from the fuel cost dataset to the visualization. The schema matching algorithm examines
the secondary dataset for one or more fields whose name exactly matches a field in the primary key of the mediated schema. While the proposed matches are often sufficient and acceptable, the user can specify an override. Since the fuel cost dataset has a field named Date, the user provides a caption of Flight Date to resolve the schema discrepancy. At this point the mediated schema is created and the VizQL workload is then federated to the wrappers for each dataset. Each wrapper compiles VizQL to SQL or MDX for the given workload, executes the query, and maps the result set into the intermediate form expected by the primary mediator.

The mapping is performed dynamically, since both the VizQL and the data model evolve during a user’s iterative analytical workflow. Finally, the primary mediator
performs a left-join of each secondary result set along the primary key of the mediated schema. In this example, the mediated result set is rendered to produce the visualization shown in Figure 3(a).

Evolved Blending Example

Figure 3(b) above shows further evolution of the analysis of airline datasets, and demonstrates several key points of data blending. First, the user adds a unique ID field named unique carrier from the primary dataset to the VizQL to visualize results for each airline ID over time. The mediated schema adapts by adding this field to its primary key, and the secondary mediator automatically queries the fuel cost dataset at this finer granularity since it too has a field named uniquecarrier. Next, the user decorates the visualization with descriptive airline names for each airline ID by dragging a field named Carrier Name from a lookup table.

This dataset is at a coarser granularity than the existing mediated schema, since it does not represent changes to the carrier name over time. Morton’s (et al) system automatically handles this challenge by allowing the left-join to use a subset of the mediated result set primary key, and replicating the carrier name across the mediated result set. Figure 4 below demonstrates this effect using a tabular view of a portion of the mediated result set, along with portions of the primary and secondary result sets.

The figure also demonstrates how the left-join preserves data for years which have no fuel cost records. Last, the user adds average airline delays from a 140 million row dataset which matches on Flight Date and uniquecarrier. This is a fast operation, since the wrapper performs mapping operations on the relatively small, aggregated result set produced by the remote database. Note that none of these additional analytical tasks required the user to intervene in data integration tasks, allowing their focus to remain on finding insight in the data.

Part 5 - Figure 4

Filtering

Tableau provides several options for filtering data. Data may be filtered based on aggregate conditions, such as excluding including airlines having a low total count of flights. A user can filter aggregate data from the primary and secondary data sources in this fashion, which results in rows being removed from the mediated result set. In contrast, row level filters are only allowed for the primary data source. To improve performance of queries sent to the secondary data sources, Tableau will filter the join keys to exclude values which are not present in the domain of the primary data source result set, since these values would be discarded by the left-join.

Data Cleaning Capabilities

As mentioned in the Inferring Join Keys section above, Tableau supports user intervention in resolving field names when schema matching fails. And once the schemas match and data is blended, the visualization can help provide feedback regarding the validity of the underlying data values and domains. If there are any data inconsistencies, users can provide aliases for a field’s data values which will override the original values in any query results involving that field. The primary mediator performs a left-join using the aliases of the data values, allowing users to blend data despite discrepancies from data entry errors and spelling variations. Tableau provides a simple user interface for editing field aliases. Calculated fields are another aspect of Tableau’s data model which support data cleaning. Calculated fields support arbitrary transformations of original data values into new data values, such as trimming whitespace from a string or constructing a date from an epoch-based integer timestamp.

As with database fields, calculated fields can be used as primary keys or join keys.

Finally, Tableau allows users to organize a field’s related data values into groups. These ad-hoc groups can be used for entity resolution, such as binding multiple variations of business names to a canonical form. Ad-hoc groups also allow constructing coarser-grained structures, such as grouping states into regions. Data blending supports joins between two ad-hoc groups, as well as joins between an ad-hoc group and a string field.

 Next: Data Blending Using MicroStrategy

———————————————————————————-

References:

[1] Kristi Morton, Ross Bunker, Jock Mackinlay, Robert Morton, and Chris Stolte, Dynamic Workload Driven Data Integration in Tableau, University of Washington and Tableau Software, Seattle, Washington, March 2012, http://homes.cs.washington.edu/~kmorton/modi221-mortonA.pdf.

[2] Hans Rosling, Wealth & Health of Nations, Gapminder.org, http://www.gapminder.org/world/.

[3] Jonathan Drummey, Tableau Data Blending, Sparse Data, Multiple Levels of Granularity, and Improvements in Version 8, Drawing with Numbers, March 11, 2013, http://drawingwithnumbers.artisart.org/tableau-data-blending-sparse-data-multiple-levels-of-granularity-and-improvements-in-version-8/.

 

Tips & Tricks #10: How to Remove the Underlining from Hyperlinks Created on an Attribute or Metric

This is another one of those little tricks that can save you from pulling your hair out trying to figure it out.

By default, hyperlinks in a Report Services document are underlined. The underlining can not be removed by changing the format of the attribute or metric.

Tip 10-1

There is an Enhancement Request open with MicroStrategy on this, but MicroStrategy does currently provide a workaround.

Workaround to Remove the Underlining is to Clear the Default Link

Step 1: Right click the object that is linked and choose Edit Links.

Step 2: Click the Clear Default button while the link is highlighted.

Tip 10-2

 

Step 3: The underlining should now be removed as shown below.

Tip 10-3

CAVEAT: The only limitation to this workaround is that in order to now get to the link, the object has to be right clicked on. The user can not directly click on the object and be taken directly to the new page (see screenshot below).

Tip 10-4

Charles Apple: Two Recent Infographic Fails You Ought to Know About

Readers:

Charles AppleI have been a big fan of Charles Apple’s work for a long time. I have blogged about him and his work in the past (see “Charles Apple” in my Categories on the right or do a search for “Charles Apple” on my blog).

Charles Apple (photo, right) is a longtime news artist, graphics reporter, designer, editor and blogger. The former graphics director of the Virginian-Pilot and the Des Moines Register, he spent five years as an international consultant and instructor. Currently, he’s Focus page editor of the Orange County Register.

I always like to reshare articles and blogs about what NOT to do in regards to data visualization and infographics. This morning, Mr. Apple posted a blog entry titled “Two recent infographic fails you ought to know about.” Charles has always shown a keen eye for detail and accuracy. He is also very reflective of his own work as today’s blog entry shows.

I hope you enjoy Mr. Apple’s thoughts as much as I do.

Have a great Good Friday and Happy Easter.

Best Regards,

Michael

Source: Charles Apple, Two recent infographic fails you ought to know about, http://www.charlesapple.com, April 18, 2014, http://www.charlesapple.com/2014/04/two-recent-infographic-fails-you-ought-to-know-about/.

Two recent infographic fails you ought to know about

A couple of charting debacles popped up this week of which you might want to take note.

POSITIVE VS. NEGATIVE SPACE

First, Reuters moved this fever chart showing the number of gun deaths in Florida going up after the state enacted its “stand your ground” law in 2005.

Just one little problem: The artist — for some unknown reason — elected to build the chart upside down from the usual way a fever chart is drawn.1404GunDeaths01Meaning the chart appears to show the number of gun deaths going down… if you focus on the white territory and consider the red to be the background of the chart.

After a lively discussion on a number of forums — most notably at Business Insider — a reader volunteered to flip the chart right-side around for clarity’s sake.1404GunDeaths02Is that better? Most folks seem to think it is.1404GunDeaths03Three important rules about infographics that I’m making up right here:

Rule 1: A graphic must be clear. If it’s not clear, then it’s not doing its job and should probably be put out of its misery.

Rule 2: It’s OK for a graphic to offer the reader a longer, more complicated view that requires more time spent observing a piece. But that’s not typically the job of a freakin’ one-column graphic.

Rule 3: Occasionally, it’s OK to flip a graphic upside down. But you’d better have a damned good reason for doing it. Other than, y’know, “I thought it’d look cool.”

This graphic fails all three: It’s not immediately clear — at least to many readers — and it’s a small graphic. So it has no business getting fancy. If the artist had a reason for turning it upside down, that reason eludes me.

Read more about the debate over this piece at…

UPSIDE DOWN YOU’RE TURNING ME

Full disclosure: I feel a little guilty criticizing this piece because I myself did something funky last week: I turned a map upside down:Unnamed_CCI_EPS

That ran in the middle of a page about John Steinbeck‘s the Grapes of Wrath. The intent was to show the route the fictional Joad family took in the book from the dust bowl of Oklahoma to what they hoped would be a better life here in Southern California.But vI really wanted to get those two pictures in there, which needed to read from left to right. I wanted those to sit atop my map showing the journey. I tried mapping it the usual way, but it was difficult to get the reader to stop — and then read this one segment of my page from right to left — and then resume reading the rest of the page from left to right.This would take quite a bit more vertical space and some very careful use of labels. And I was plum out of vertical space.So I elected to flop the map upside down. My logic: This time, it was more important to follow the narrative — to feel the twists and turns in the Joads’ journey — than to take in the geographical details of the trip. If the upside-down map was vetoed, Plan B would have been to kill the map and run the list of cities in a timeline-like format. There was just one problem with that: I already had a timeline on the page, just above the map:

Unnamed_CCI_EPS

We debated this and decided I was right to flip the map — This time. I can’t imagine too many times we’d ever want to run a map with the north arrow pointing down.And, y’know, perhaps we did the wrong thing. Another editor might have made a different choice.But the point is: We made a conscious decision here to let the map support the narrative. I don’t know what point Reuters was making with its upside-down fever chart. Whatever it was, it’s not apparent to me.It’s OK to make unusual choices. Just make sure your data is clear, your story is clear and readers don’t walk way from your piece puzzled as hell.

WHEN IS A MAP NOT A MAP?

This seems like a good time to present the other infographics debacle this week: This one is by NBC News.1404DemographicsOh, dear. I was just talking about using a map when the map wasn’t the most important element.What we have here is another fever chart, but this one has been pasted inside a map of the U.S. This has a number of effects that harm the greater good we do by presenting the data in the first place:Fever charts (and pie charts and bar charts and most other charts, for that matter) are all about showing proportions. If the proportions get screwed up — by, say, varying the widths of your bars or by covering up part of the chart — then the reader can’t make the visual comparisons you’re asking her to make.And that’s the case here: We see territory marked as “Asian” in the upper left of the chart and also at the upper right. But where is that set of data in 2010? I’m guessing it’s there, but it’s hidden outside the area of the map.

Rule 4: If you’re going to hide important parts of your chart, then your chart is no good. And, yes, it should be put out of its misery.

The data is displayed over a map. What is the artist trying to tell us? Where white people live in the U.S.? That Hispanics only live near Canada and Asians in Washington State and New England?No, the map is merely a decorative element. It has nothing at all to do with the data.

Rule 5: If you don’t need an element to tell your story, then eliminate it. Or I will.

Rule 6: If your decorative element gets in the way of your story, then not only do I demand you eliminate it, I also insist you come over here so I can smack you upside your head.

Rule 7: Don’t use a map if you’re not telling a story that includes some type of data that needs geographical context.

Oh, and don’t forget this last one:

Rule 8: Don’t tilt a map or turn it upside down. Not unless you have a good reason.

Go here to read more about the perils of rotating maps.

An Introduction to Data Blending – Part 4 (Data Blending Design Principles)

Readers:

In Part 3 of this series on data blending, we examining the benefits of blending data. We also reviewed an example of data blending that illustrated the possible outcomes of an election for the District 2 Supervisor of San Francisco.

Today, in Part 4 of this series, we will discuss data blending design principles and show another illustrative example of data blending using Tableau.

Again, much of Parts 1, 2, 3 and 4 are based on a research paper written by Kristi Morton from The University of Washington (and others) [1].

You can learn more about Ms. Morton’s research as well as other resources used to create this blog post by referring to the References at the end of the blog post.

Best Regards,

Michael

Data Blending Design Principles

In Part 3, we describe the primary design principles upon which Tableau’s data blending feature was based. These principles were influenced by the application needs of Tableau’s end-user. In particular, we designed the blending system to be able to integrate datasets on-the-fly, be responsive to change, and driven by the visualization. Additionally, we assumed that the user may not know exactly what she is looking for initially, and needs a flexible, interactive system that can handle exploratory visual analysis.

Push Computation to Data and Minimize Data Movement

Tableau’s approach to data visualization allows users to leverage the power of a fast database system. Tableau’s VizQL algebra is a declarative language for succinctly describing visual representations of data and analytics operations on the data. Tableau compiles the VizQL declarative formalism representing a visual specification into SQL or MDX and pushes this computation close to the data, where the fast database system handles computationally intensive aggregation and filtering operations. In response, the database provides a relatively small result set for Tableau to render. This is an important factor in Tableau’s choice of post-aggregate data integration across disparate data sources – since the integrated result sets must represent a cognitively manageable amount of information, the data integration process operates on small amounts of aggregated, filtered data from each data source. This approach avoids the costly migration effort to collocate massive data sets in a single warehouse, and continues to leverage fast databases for performing expensive queries close to the data.

Automate as Much as Possible, but Keep User in Loop

Tableau’s primary focus has been on ease of use since most of Tableau’s end-users are not database experts, but range from a variety of domains and disciplines: business analysts, journalists, scientists, students, etc. This lead them to take a simple, pay-as-you-go integration approach in which the user invests minimal upfront effort or time to receive the benefits of the system. For example, the data blending system does not require the user to specify schemas for their data sets, rather the system tries to infer this information as well as how to apply schema matching techniques to blend them for a given visualization. Furthermore, the system provides a simple drag-and-drop interface for the user to specify the fields for a visualization, and if there are fields from multiple data sources in play at the same time, the blending system infers how to join them to satisfy the needs of the visualization.

In the case that something goes wrong, for example, if the schema matching could not succeed, the blending system provides a simple interface for specifying data source relationships and how blending should proceed. Additionally, the system provides several techniques for managing the impact of dirty data on blending, which we discuss in more in Part 5 of this series.

Another Example: Patient Falls Dashboard [3]

NOTE: The following example is from Jonathan Drummey via the Drawing with Numbers blog site. The example uses Tableau v7, but at the end of the instructions on how he creates this dashboard in Tableau v7, Mr. Drummey includes instructions how the steps became more simplied in Tableau v8. I have included a reference to this blog post on his site in the reference section of my blog entry. The “I”, “me” voice you read in this example is that of Mr. Drummey.

As part of improving patient safety, we track all patient falls in our healthcare system, and the number of patient days – the total of the number of days of inpatient stays at the hospital. Every month report we report to the state our “fall rate,” a metric of the number of falls with injury for certain units in the hospital per 1000 patient days, i.e. days that patients are at the hospital. Our annualized target is to have less than 0.7 falls with injury per 1000 patient days.

A goal for our internal dashboard is to show the last 13 months of fall rates as a line chart, with the most recent fall events as a bar chart, in a combined chart, along with a separate text table showing some details of each fall event. Here’s the desired chart, with mocked-up data:

 

combo bars and lines

On the surface, blending this data seems really straightforward. We generate a falls rate very month for every reporting unit, so use that as the primary, then blend in the falls as they happen. However, this has the following issues:

  • Sparse Data – As I’m writing this, it’s March 7th. We usually don’t get the denominator of the patient days for the prior month (February) for a few more days yet, so there won’t be any February row of measure data to use as the primary to get the February fall events to show on the dashboard. In addition, there still wouldn’t be any March data to get the March fall events. Sometimes when working with blend, the solution is to flip our choices for the primary and secondary datasource. However, that doesn’t work either because a unit might go for months or years without a patient fall, so there wouldn’t be any fall events to blend in the measure data.
  • Falls With and Without Injury – In the bar chart, we don’t just want to show the number of patient falls, we want to break down the falls by whether or not they were falls with injury – the numerator for the fall rate metric – and all other falls. The goal of displaying that data is to help the user keep in mind that as important as it is to reduce the number of falls with injury, we also need to keep the overall number of falls down as well. No fall = no chance of fall with injury.
  • Unit Level of Detail – Because the blend needs to work at the per-unit level of detail as well as across all reporting units, that means (in version 7 at least) that the Unit needs to be in the view for the blend to work. But we want to display a single falls rate no matter how many units are selected.

Sparse Data

To deal with issue of sparse data, there are a few possible solutions:

  • Change the combined line and bar chart into separate charts. This would perhaps be the easiest, though it would require some messing about with filters, hidden reference lines, and continuous date axes to ensure that the two charts had similar axis ranges no matter what. However, that would miss out on the key capability of the combined chart to directly see how a fall contributes to the fall rate. In addition, there would be no reason to write this blog post. :)
  • Perform padding in the data source, either via a query/view or Custom SQL. In an earlier version of this project I’d built this, and maintaining a bunch of queries with Cartesian joins isn’t my favorite cup of tea.
  • Building a scaffold data source with all combinations of the month and unit and using the scaffold as the primary data source. While possible, this introduces maintenance issues when there’s a need for additional fields at a finer level of detail. For example, the falls measure actually has three separate fall rates – monthly, quarterly, and annual. These are generated as separate rows in our measures data and the particular duration is indicated by the Period field. So the scaffold source would have to include the Period field to get the data, but then that could be too much detail for the blended fall event data, and make for more complexity in the calculations to make sure the aggregations worked properly.
  • Do a tiny bit of padding in the query, then do the rest in Tableau via Show Missing Values aka domain padding. As I’d noted in an earlier post on blending, domain padding occurs before data is blended so we can pad out the measure data through the current date and then include all the falls. This is the technique I chose, for the reason that padding one row to the data is trivial and turning on Show Missing Values is a couple of mouse clicks. Here’s how I did that:

In my case, the primary data source is a Microsoft Access query that gets the falls measure results from a table that also holds results for hundreds of other metrics that we track. I created a second query with the same number of columns that returns Null for every field except the Measure Date, which has a value of 1/1/1900. Then a third query UNION’s those two queries together, and that’s what is used as the data source in Tableau.

Then, in Tableau, I added a calculated field called Date with the following formula:

//used for padding out display to today
IF [Measure Date] == #1/1/1900# THEN 
    TODAY() 
ELSE 
    [Measure Date] 
END

The measure results data contains a row per measure, reporting unit, and the period. These are pre-calculated because the data is used in a variety of different outputs. Since in this dashboard we are combining the results across units, we can’t just use the rate, we need to go back to the original numerator and denominator. So, I also created a new field for the Calculated Rate:

SUM([Numerator])/SUM([Denominator])

Now it’s possible to start building the line chart view:

  1. Put the Month(Date) – the full month/year version as a discrete – on Columns, Calculated Rate on Rows, Period on the Color Shelf. This only shows the data that exists in the data source, including the empty value for the current month (March in this case):

 

Screen Shot 2013-03-09 at 1.11.25 PM

 

  1. Turn on Show Missing Values for Month(Date) to start domain padding. Now we can see the additional column(s) for the month(s) – February in this case between January to the current month that Tableau has added in:

 

Screen Shot 2013-03-09 at 1.14.19 PM

 

With a continuous (green pill) date, this particular set-up won’t work in version 8. Tableau’s domain padding is not triggered when the last value of the measure is Null. I’m hoping this is just an issue with the beta, I’ll revise this section with an update once I find out what’s going on.

Even though the measure data only has end of month dates, instead of using Exact Date for the month I used Month(Date) because of two combined factors: One is that the default import of most date fields from MS Jet sources turns them into DateTime fields, the second is that Show Missing Values won’t work on an Exact Date for a DateTime field, you have to assign an aggregation to a DateTime (even Second will work). This is because domain padding at this level can create an immense number of new rows and cause Tableau to run out of memory, so Tableau keeps the option off unless you want it. Also note that you can turn on Show Missing Values for an Exact Date for a Date Field.

  1. Now for some cleanup steps: for the purposes of this dashboard, filter Period to remove Monthly (we do quarterly reporting), but leave in Null because that’s needed for the domain padding.
  2. Right-click Null on the Color Legend and Hide it. Again, we don’t exclude this because this would cause the extra row for the domain padding to fail.
  3. Set up a relative date filter on the Date field for the last 13 months. This filter works just fine with the domain padding.

Filtering on Unit

Here’s a complicating factor: If we add a filter on Unit, there’s a Null listed here:

 

Screen Shot 2013-03-09 at 1.18.31 PM

I’d just want to see the list of units. But if we filter that Null out, then we lose the domain padding, the last date is now January 2013:

 

Screen Shot 2013-03-09 at 1.18.58 PM

 

One solution here would be to alter the padding to add a padding row for every unit, instead of just one unit. Since Tableau doesn’t let us just hide elements in a filter, and we actually have more reporting units in our data than we are displaying on the dashboards, I chose to use a parameter filter because there are more reporting units in our production data than we are displaying on the dashboards, yet the all-unit rate needs to include all of the data. Setting this up included a parameter with All and each of the units, and a calculated field called “Chosen Unit Filter” with the following formula, that is set to Filter on False:

[Choose Unit] == "All" OR [Choose Unit] == [Unit]

Falls With and Without Injury

In a fantasy world, to create the desired stacked bars I’d be able to drag the Number of Records from the secondary datasource, i.e. the number of fall events, drag an Injury indicator onto the Color Shelf, and be done. However, that runs into the issue of having a finer level of detail in the secondary than in the primary, which I’ll walk through solutions for in the next section. In this case, since there are only two different numbers, the easy way is to generate two separate measures, then use Measure Names/Measure Values to create the stacked bars – Measure Values on Rows, and Measure Names on the Color Shelf. Here’s the basic calculation for Falls with Injury:

SUM(IF [Injury] != "None" THEN 1 ELSE 0 END)

We’re using a row-level calculated field to generate the measure, and a slightly different calc for Falls w/out Injury.

Unit Level of Detail

When we want to blend in Tableau at a finer level of detail and aggregate to a higher level, historically there have been three options:

  • Don’t use blending at all, instead use a query to perform the “blend” outside of Tableau. In the case that there are totally different data sources, this can be more difficult but not impossible by using one of the systems or a different system to create a federated data source, for example by adding your Oracle table as an ODBC connection to your Excel data, then making the query on that. In this case, we don’t have to do that.
  • Use Tableau’s Primary Groups feature “push” the detail from the secondary into the primary data source. This is a really helpful feature, the one drawback is that it’s not dynamic so any time there are new groupings in the secondary it would have to be re-run. Personally, I prefer automating as much as possible so I tend not to use this technique.
  • Set up the view with the needed dimensions in the view – on the Level of Detail Shelf, for example – and then use table calculations to do the aggregation. This is how I’ve typically built this kind of view.

Tableau version 8 adds a fourth option:

  • Tell Tableau what fields to blend on, then bring in your measures from the secondary.

I’ll walk through the table calculation technique, which works the same in version 7 and version 8, and then how to take advantage of v8′s new feature.

Using Table Calculations to Aggregate Blended Data

In order to blend the the falls data at the hospital unit level to make sure that we’re only showing falls for the selected unit(s), the Unit has to be in the view (on the Rows, Columns, or Pages Shelves, or on the Marks Card). Since we don’t actually need to display the Unit, the Level of Detail Shelf is where we’ll put that dimension. However, just adding that to the view leads to a bar for each unit, for example for April 2012 one unit had one fall with injury and another had two, and two units each had two falls without injury.

 

Screen Shot 2013-03-09 at 1.30.27 PM

 

To control things like tooltips (along with performance in some cases), it’s a lot easier to have a single bar for each month/measure. To do that, we turn to a table calculation, here’s the Falls w/Injury for v7 Blend calculated field, set up in the secondary data source:

IF FIRST()==0 THEN
	TOTAL([Falls w/Injury])
END

This table calculation has a Compute Using of Unit, so it partitions on the Month of Date. The IF FIRST()==0 part ensures that there is only one mark per partition. I’m using the TOTAL() aggregation here because it’s easier to set up and maintain. The alternative is to use WINDOW_SUM(), but in Tableau prior to version 7 there are some performance issues, so the calc would be:

IF FIRST()==0 THEN
	WINDOW_SUM(SUM(Falls w/Injury]), 0, IIF(FIRST()==0,LAST(),0))
END

The ,0 IIF(FIRST()==0,LAST(),0 part is necessary in version 7 to optimize performance, you can get rid of that in version 8.

You can also do a table calculation in the primary that accesses fields in the secondary, however TOTAL() can’t be used across blended data sources, so you’d have to use the WINDOW_SUM version.

With a second table calculation for the Falls w/out Injury, now the view can be built, starting with the line chart from above:

  1. Add Measure Names (from the Primary) to Filters Shelf, filter it for a couple of random measures.
  2. Put Measure Values on the Rows Shelf.
  3. Click on the Measure Values pill on Rows to set the Mark Type to Bar.
  4. Drag Measure Names onto the Color Shelf (for the Measure Values marks).
  5. Drag Unit onto the Level of Detail Shelf (for the Measure Values marks).
  6. Switch to the Secondary to put the two Falls for v7 Blend calcs onto the Measure Values Shelf.
  7. Set their Compute Usings to Unit.
  8. Remove the 2 measures chosen in step 1.
  9. Clean up the view – turn on dual axes, move the secondary axis marks to the back, change the axis tick marks to integers, set axis titles, etc.

This is pretty cool, we’re using domain padding to fill in for non-existent data and then having a blend happening at one level of detail while aggregating to another, just for the second axis. Here’s the v7 workbook on Tableau Public:

Patient Falls Dashboard - Click on Image to go to Tableau Public

Patient Falls Dashboard – Click on image above to go to Tableau Public

Tableau Version 8 Blending – Faster, Easier, Better

For version 8, Tableau made it possible to blend data without requiring the linking fields in the view. Here’s how I build the above v7 view in v8:

  1. Add Measure Names (from the Primary) to Filters Shelf, filter it for a couple of random measures.
  2. Put Measure Values on the Rows Shelf.
  3. Click on the Measure Values pill on Rows to set the Mark Type to Bar.
  4. Drag Measure Names onto the Color Shelf (for the Measure Values marks).
  5. Switch to the Secondary and click the chain link icon next to Unit to turn on blending on Unit.
  6. Drag the Falls w/Injury and Falls w/out Injury calcs onto the Measure Values Shelf.
  7. Remove the 2 measures chosen in step 1.
  8. Clean up the view – turn on dual axes, move the secondary axis marks to the back, change the axis tick marks to integers, set axis titles, etc.

The results will be the same as v7.

Next: Tableau’s Data Blending Architecture

———————————————————————————-

References:

[1] Kristi Morton, Ross Bunker, Jock Mackinlay, Robert Morton, and Chris Stolte, Dynamic Workload Driven Data Integration in Tableau, University of Washington and Tableau Software, Seattle, Washington, March 2012, http://homes.cs.washington.edu/~kmorton/modi221-mortonA.pdf.

[2] Hans Rosling, Wealth & Health of Nations, Gapminder.org, http://www.gapminder.org/world/.

[3] Jonathan Drummey, Tableau Data Blending, Sparse Data, Multiple Levels of Granularity, and Improvements in Version 8, Drawing with Numbers, March 11, 2013, http://drawingwithnumbers.artisart.org/tableau-data-blending-sparse-data-multiple-levels-of-granularity-and-improvements-in-version-8/.

 

An Introduction to Data Blending – Part 3 (Benefits of Blending Data)

Readers:

In Part 2 of this series on data blending, we delved deeper into understanding what data blending is. We also examined how data blending is used in Hans Rosling’s well-known Gapminder application.

Today, in Part 3 of this series, we will dig even deeper by examining the benefits of blending data.

Again, much of Parts 1, 2 and 3 are based on a research paper written by Kristi Morton from The University of Washington (and others) [1].

You can learn more about Ms. Morton’s research as well as other resources used to create this blog post by referring to the References at the end of the blog post.

Best Regards,

Michael

Benefits of Blending Data

In this section, we will examine the advantages of using the data blending feature for integrating datasets. Additionally, we will review another illustrative example of data blending using Tableau.

Integrating Data Using Tableau

In Ms. Morton’s research, Tableau was equipped with two ways of integrating data. First, in the case where the data sets are collocated (or can be collocated), Tableau formulates a query that joins them to produce a visualization. However, in the case where the data sets are not collocated (or cannot be collocated), Tableau federates queries to each data source, and creates a dynamic, blended view that consists of the joined result sets of the queries. For the purpose of exploratory visual analytics, Ms. Morton (et al) found that data blending is a complementary technology to the standard collocated approach with the following benefits:

  • Resolves many data granularity problems
  • Resolves collocation problems
  • Adapts to needs of exploratory visual analytics

Figure 1 - Company Tables

Image: Kristi Morton, Ross Bunker, Jock Mackinlay, Robert Morton, and Chris Stolte, Dynamic Workload Driven Data Integration in Tableau. [1]

Resolving Data Granularity Problems

Often times a user wants to combine data that may not be at the same granularity (i.e. they have different primary keys). For example, let’s say that an employee at company A wants to compare the yearly growth of sales to a competitor company B. The dataset for company B (see Figure 1 above) contains a detailed quarterly growth of sales for B (quarter, year is the primary key), while company A’s dataset only includes the yearly sales (year is the primary key). If the employee simply joins these two datasets on yearly earnings, then each row from A will be duplicated for each quarter in B for a given year resulting in an inaccurate overestimate of A’s yearly earnings.

This duplication problem can be avoided if for example, company B’s sales dataset were first aggregated to the level of year, then joined with company A’s dataset. In this case, data blending detects that the data sets are at different granularities by examining their primary keys and notes that in order to join them, the common field is year. In order to join them on year, an aggregation query is issued to company B’s dataset, which returns the sales aggregated up to the yearly level as shown in Figure 1. This result is blended with company A’s dataset to produce the desired visualization of yearly sales for companies A and B.

The blending feature does all of this on-the-fly without user-intervention.

Resolves Collocation Problems

As mentioned in Part 1, managed repository is expensive and untenable. In other cases, the data repository may have rigid structure, as with cubes, to ensure performance, support security or protect data quality. Furthermore, it is often unclear if it is worth the effort of integrating an external data set that has uncertain value. The user may not know until she has started exploring the data if it has enough value to justify spending the time to integrate and load it into her repository.

Thus, one of the paramount benefits of data blending is that it allows the user to quickly start exploring their data, and as they explore the integration happens automatically as a natural part of the analysis cycle.

An interesting final benefit of the blending approach is that it enables users to seamlessly integrate across different types of data (which usually exist in separate repositories) such as relational, cubes, text files, spreadsheets, etc.

Adapts to Needs of Exploratory Visual Analytics

A key benefit of data blending is its flexibility; it gives the user the freedom to view their blended data at different granularities and control how data is integrated on-the-fly. The blended views are dynamically created as the user is visually exploring the datasets. For example, the user can drill-down, roll-up, pivot, or filter any blended view as needed during her exploratory analysis. This feature is useful for data exploration and what-if analysis.

Another Illustrative Example of Data Blending

Figure 2 (below) illustrates the possible outcomes of an election for District 2 Supervisor of San Francisco. With this type of visualization, the user can select different election styles and see how their choice affects the outcome of the election.

What’s interesting from a blending standpoint is that this is an example of a many-to-one relationship between the primary and secondary datasets. This means that the fields being left-joined in by the secondary data sources match multiple rows from the primary dataset and results in these values being duplicated. Thus any subsequent aggregation operations would reflect this duplicate data, resulting in overestimates. The blending feature, however, prevents this scenario from occurring by performing all aggregation prior to duplicating data during the left-join.

Figure 2 - San Francisco Election

 Image: Kristi Morton, Ross Bunker, Jock Mackinlay, Robert Morton, and Chris Stolte, Dynamic Workload Driven Data Integration in Tableau. [1]

Next: Data Blending Design Principles

——————————————————————————————————–

References:

[1] Kristi Morton, Ross Bunker, Jock Mackinlay, Robert Morton, and Chris Stolte, Dynamic Workload Driven Data Integration in Tableau, University of Washington and Tableau Software, Seattle, Washington, March 2012, http://homes.cs.washington.edu/~kmorton/modi221-mortonA.pdf.

[2] Hans Rosling, Wealth & Health of Nations, Gapminder.org, http://www.gapminder.org/world/.