Tag Archives: Tableau

Bryan Brandow: Triggering Cubes & Extracts using Tableau or MicroStrategy

trigger-720x340

bryan-headshots-004Bryan Brandow (photo, right), a Data Engineering Manager for a large social media company, is one of my favorite bloggers out their in regards to thought leadership and digging deep into the technical aspects of Tableau and MicroStrategy. Bryan just blogged about triggering cubes and extracts on his blog. Here is a brief synopsis.

One of the functions that never seems to be included in BI tools is an easy way to kick off an application cache job once your ETL is finished. MicroStrategy’s Cubes and Tableau’s Extracts both rely on manual or time based refresh schedules, but this leaves you in a position where your data will land in the database and you’ll either have a large gap before the dashboard is updated or you’ll be refreshing constantly and wasting lots of system resources. They both come with command line tools for kicking off a refresh, but then it’s up to you to figure out how to link your ETL jobs to call these commands. What follows is a solution that works in my environment and will probably work for yours as well. There are of course a lot of ways for your ETL tool to tell your BI tool that it’s time to refresh a cache, but this is my take on it. You won’t find a download-and-install software package here since everyone’s environment is different, but you will find ample blueprints and examples for how to build your own for your platform and for whatever BI tool you use (from what I’ve observed, this setup is fairly common). Trigger was first demoed at the Tableau Conference 2014. You can jump to the Trigger demo here.

I recommend you click on the link above and give his blog post a full read. It is well worth it.

Best regards,

Michael

Bryan’s BI Blog: MicroStrategy vs Tableau

Readers:

Bryan BrandowBryan Brandow, has posted his second new post on his new blog, Bryan’s BI Blog and it is a doozy. Bryan does an in-depth comparison of MicroStrategy vs. Tableau.

Here is a link to the MicroStrategy vs. Tableau post.

Best Regards,

Michael

 

An Introduction to Data Blending – Part 5 (Tableau’s Data Blending Architecture)

Readers:

In Part 4 of this series on data blending, we reviewed Tableau’s Data Blending Principles. We also reviewed an example of data blending in Jonathan Drummey’s Patient Falls Dashboard. [3]

Today, in Part 5 of this series, we will peel the onion a bit more and look at Tableau’s Data Blending Architecture.

Again, much of Parts 1 – 5 are based on a research paper written by Kristi Morton from The University of Washington (and others) [1].

You can learn more about Ms. Morton’s research as well as other resources used to create this blog post by referring to the References at the end of the blog post.

Best Regards,

Michael

Integrating Data in Tableau

In Part 5, we discuss in greater detail how data blending works. Then we discuss how a user builds visualizations using data blending using several large datasets involving airline statistics.

Data Blending Architecture

Part 5 - Figure 1

The data blending system, shown in Figure 1 above, takes as input the VizQL query workload generated by the user’s GUI actions and data source schemas, and automatically infers how to query the data sources remotely and combine their results on-the-fly. The system features a two-tier mediator-based architecture in which the VizQL query workload is analyzed and partitioned at runtime based on the corresponding data source fields being used. The primary mediator initiates this process by removing the visual encodings from the VizQL query workload to yield an abstract query. The abstract query is partitioned for further processing by the primary mediator and one or more secondary mediators. The primary mediator creates the mediated schema for the given query workload. It then federates the abstract queries to the primary data source as well as the secondary mediators and their respective data sources. The wrappers compile the abstract queries into concrete SQL or MDX queries and instantiate the semantic mappings between the data sources and the mediated schema for each query. The primary mediator joins all the result sets returned from all data sources to produce the mediated result set used by the rendering system. [1]

Part 5 - Figure 2

Post-aggregate Join

A visualization is organized by its discrete fields into pages, partitions, colors, etc., and like a GROUP BY clause in SQL, these grouping fields comprise the primary key of the visualization. In a blended visualization, the grouping fields from the primary data source become the primary key of the mediated schema. In Figure 2 above, these are shown as the dark-green fields in the primary data source, and the light green fields represent the aggregated data. Each secondary data source must contain at least one field that matches a visualization grouping field in order to blend into the mediated schema. The matching fields in a secondary data source comprise its join key, and fields appear in the GROUP BY clause issued by the secondary mediator wrappers. The aggregated data from the secondary data source, shown in light-purple, is then left-joined along its join key into the mediated result set.  Morton (et al) refer to this left-join of aggregated result sets as a post-aggregate join. [1]

Primary Key Cardinality

many mapping between the domain values of the primary key and those of the secondary join key, because the secondary join key is a subset of the primary key and contains only unique values in the aggregated secondary result set. Morton (et al) find that this approach is the most natural for augmenting a visualization with secondary data sources of uncertain value or quality, which is a common scenario for Tableau users.

Data blending supports many-to-one relationships between the primary and each secondary. This can occur when the secondary data source contains coarser-grained data than the mediated result set, as discussed in Part 3 of this series.

Since the join key in a secondary result set may match a subset of the blended result set primary key, portions of the secondary result set may be duplicated across repeated values in the mediated result set. This does not pose risk of double-counting measure values, becaused all aggregation is performed prior to the join. When a blended visualization uses multiple secondary data sources, each secondary join key may match any subset of the primary key. The primary mediator handles duplicating each secondary result set as needed to join with the mediated result set.

Finally, a secondary dimension which is not part of the join key (and thus not a grouping field in the secondary query) can still be used in the visualization. If it is functionally dependent on the join key, a secondary dimension can be used without affecting the result set cardinality. Tableau references this kind of non-grouping dimension using both MIN and MAX aggregations in the query issued to the secondary data source, which allows Tableau to determine if the dimension is functionally dependent on the join key. For each row in the secondary result set, if the two aggregated values are the same then the value is used as-is, reflecting the functional dependence on the grouping fields. If the aggregated values differ, Tableau represents the value using a special form of NULL called ManyValues. This is represented in the visualization as a ‘*’, but retains the behavior of NULL
when used in calculated fields or other computations. The visual feedback allows a user to distinguish this lack of data from the NULLs which occur due to missing or mismatched data.

Inferring Join Keys

Tableau uses very simple rules for automatically detecting candidate join keys:

  1. The secondary data source field name must match a field with the same name in the primary data source.
  2. The data types must match
  3. If they are date/time fields, they must represent the same granularity date bin in the date/time hierarchy, e.g. both are MONTH. A user can intervene to force a match either by providing field captions to rename fields within the Tableau data model, or by explicitly defining a link between fields using a simple user interface.

Part 5 - Figure 3

Another Simple Blending Example

A Tableau data blending scenario is shown in Figure 3 above, which includes multiple views that were composed in minutes by uniquely mashing up four different airline datasets, the largest of which include a 324 million row ticket pricing database and a 140 million row on-time performance database. A user starts by dragging fields from any dataset on to a blank visual canvas, iteratively building a VizQL statement which ultimately produces a visualization. In this example, the user first drags the VizQL fields, YEAR(Flight Date) and AVG(Airfare), from the pricing dataset onto the visual canvas.

Data blending occurs when the user adds fields from a separate dataset to an existing VizQL statement in order to augment their analysis. Tableau assigns the existing dataset to the primary mediator and uses secondary mediators to manage each subsequent dataset added to the VizQL. The mediated schema has a primary key composed of the grouping VizQL fields from the primary dataset (e.g. YEAR(Flight Date)); the remaining fields in the mediated schema are the aggregated VizQL fields from the primary dataset along with the VizQL fields from each secondary dataset.

Continuing our example, the user wishes to drag AVG(Total Cost per Gallon) from the fuel cost dataset to the visualization. The schema matching algorithm examines
the secondary dataset for one or more fields whose name exactly matches a field in the primary key of the mediated schema. While the proposed matches are often sufficient and acceptable, the user can specify an override. Since the fuel cost dataset has a field named Date, the user provides a caption of Flight Date to resolve the schema discrepancy. At this point the mediated schema is created and the VizQL workload is then federated to the wrappers for each dataset. Each wrapper compiles VizQL to SQL or MDX for the given workload, executes the query, and maps the result set into the intermediate form expected by the primary mediator.

The mapping is performed dynamically, since both the VizQL and the data model evolve during a user’s iterative analytical workflow. Finally, the primary mediator
performs a left-join of each secondary result set along the primary key of the mediated schema. In this example, the mediated result set is rendered to produce the visualization shown in Figure 3(a).

Evolved Blending Example

Figure 3(b) above shows further evolution of the analysis of airline datasets, and demonstrates several key points of data blending. First, the user adds a unique ID field named unique carrier from the primary dataset to the VizQL to visualize results for each airline ID over time. The mediated schema adapts by adding this field to its primary key, and the secondary mediator automatically queries the fuel cost dataset at this finer granularity since it too has a field named uniquecarrier. Next, the user decorates the visualization with descriptive airline names for each airline ID by dragging a field named Carrier Name from a lookup table.

This dataset is at a coarser granularity than the existing mediated schema, since it does not represent changes to the carrier name over time. Morton’s (et al) system automatically handles this challenge by allowing the left-join to use a subset of the mediated result set primary key, and replicating the carrier name across the mediated result set. Figure 4 below demonstrates this effect using a tabular view of a portion of the mediated result set, along with portions of the primary and secondary result sets.

The figure also demonstrates how the left-join preserves data for years which have no fuel cost records. Last, the user adds average airline delays from a 140 million row dataset which matches on Flight Date and uniquecarrier. This is a fast operation, since the wrapper performs mapping operations on the relatively small, aggregated result set produced by the remote database. Note that none of these additional analytical tasks required the user to intervene in data integration tasks, allowing their focus to remain on finding insight in the data.

Part 5 - Figure 4

Filtering

Tableau provides several options for filtering data. Data may be filtered based on aggregate conditions, such as excluding including airlines having a low total count of flights. A user can filter aggregate data from the primary and secondary data sources in this fashion, which results in rows being removed from the mediated result set. In contrast, row level filters are only allowed for the primary data source. To improve performance of queries sent to the secondary data sources, Tableau will filter the join keys to exclude values which are not present in the domain of the primary data source result set, since these values would be discarded by the left-join.

Data Cleaning Capabilities

As mentioned in the Inferring Join Keys section above, Tableau supports user intervention in resolving field names when schema matching fails. And once the schemas match and data is blended, the visualization can help provide feedback regarding the validity of the underlying data values and domains. If there are any data inconsistencies, users can provide aliases for a field’s data values which will override the original values in any query results involving that field. The primary mediator performs a left-join using the aliases of the data values, allowing users to blend data despite discrepancies from data entry errors and spelling variations. Tableau provides a simple user interface for editing field aliases. Calculated fields are another aspect of Tableau’s data model which support data cleaning. Calculated fields support arbitrary transformations of original data values into new data values, such as trimming whitespace from a string or constructing a date from an epoch-based integer timestamp.

As with database fields, calculated fields can be used as primary keys or join keys.

Finally, Tableau allows users to organize a field’s related data values into groups. These ad-hoc groups can be used for entity resolution, such as binding multiple variations of business names to a canonical form. Ad-hoc groups also allow constructing coarser-grained structures, such as grouping states into regions. Data blending supports joins between two ad-hoc groups, as well as joins between an ad-hoc group and a string field.

 Next: Data Blending Using MicroStrategy

———————————————————————————-

References:

[1] Kristi Morton, Ross Bunker, Jock Mackinlay, Robert Morton, and Chris Stolte, Dynamic Workload Driven Data Integration in Tableau, University of Washington and Tableau Software, Seattle, Washington, March 2012, http://homes.cs.washington.edu/~kmorton/modi221-mortonA.pdf.

[2] Hans Rosling, Wealth & Health of Nations, Gapminder.org, http://www.gapminder.org/world/.

[3] Jonathan Drummey, Tableau Data Blending, Sparse Data, Multiple Levels of Granularity, and Improvements in Version 8, Drawing with Numbers, March 11, 2013, http://drawingwithnumbers.artisart.org/tableau-data-blending-sparse-data-multiple-levels-of-granularity-and-improvements-in-version-8/.

 

Has MicroStrategy Toppled Tableau as the Analytics King?

MicroStrategy Analytics

In a recent TDWI article titled Analysis: MicroStrategy’s Would-Be Analytics King, Stephen Swoyer, who is a technology writer based in Nashville, TN, stated that business intelligence (BI) stalwart MicroStrategy Inc. pulled off arguably the biggest coup at Teradata Corp.’s recent Partners User Group (Partners) conference, announcing a rebranded, reorganized, and — to some extent — revamped product line-up.

One particular announcement drew great interest: MicroStrategy’s free version of its discovery tool — Visual Insight — which it packages as part of a new standalone BI offering: MicroStrategy Analytics Desktop.

With Analytics Desktop, MicroStrategy takes dead aim at insurgent BI offerings from QlikTech Inc., Tibco Spotfire, and — most particularly — Tableau Software Inc.

MicroStrategy rebranded its products into three distinct groups: the MicroStrategy Analytics Platform (consisting of MicroStrategy Analytics Enterprise version 9.4 — an updated version of its v9.3.1 BI suite); MicroStrategy Express (its cloud platform available in both software- and platform-as-a-service  subscription options; and MicroStrategy Analytics Desktop (a single-user, BI discovery solution). MicroStrategy Analytics Enterprise takes a page from Tableau’s book via support for data blendinga technique that Tableau helped to popularize.

“We’re giving the business user the tools to join data in an ad hoc sort of environment, on the fly. That’s a big enhancement for us. The architectural work that we did to make that enhancement work resulted in some big performance improvements [in MicroStrategy Analytics Enterprise]: we improved our query performance for self-service analytics by 40 to 50 percent,” said Kevin Spurway, senior vice president of marketing with MicroStrategy.

Spurway — who, as an interesting aside, has a JD from Harvard Law School — said MicroStrategy implements data blending in much the same way that Tableau does: i.e., by doing it in-memory. Previous versions of MicroStrategy BI employed an interstitial in-memory layer, Spurway said; the performance improvements in MicroStrategy Analytics Enterprise result from shifting to an integrated in-memory design, he explained.

“It’s a function of just our in-memory [implementation]. Primarily it has to do with the way the architecture on our end works: we used to have kind of a middle in-memory layer that we’ve removed.”

Spurway described MicroStrategy Desktop Analytics as a kind of trump card: a standalone, desktop-oriented version of the MicroStrategy BI suite — anchored by its Visual Insight tool and designed to address the BI discovery use case. Desktop Analytics can extract data from any ODBC-compliant data source. Like Enterprise Analytics, it’s powered by an integrated in-memory engine.

In other words: a Tableau-killer.

“That [Visual Insight] product has been out there but has always been kind of locked up in our Enterprise product,” he said, acknowledging that MicroStrategy offered Visual Insight as part of its cloud stack, too. “You had to be a MicroStrategy customer who obviously has implemented the enterprise solution, or you could get it through Express, [which is] great for some people, but not everybody wants a cloud-based solution. With [MicroStrategy Desktop Analytics], you go to our website, download and install it, and you’re off and running — and we’ve made it completely free.”

The company’s strategy is that many users will, as Spurway put it, “need more.” He breaks the broader BI market into two distinct segments — with a distinct, Venn-diagram-like area of overlap.

“There’s a visual analytics market. It’s a hot market, which is primarily being driven by business-user demand. Then there’s the traditional business intelligence market, and that market has been there for 20 years. It’s not growing as quickly, and there’s some overlap between the two,” he explained.

“The BI market is IT-driven. For business users, they need speed, they need better ways to analyze their data than Excel provides; they don’t want impediments, they need quick time to value. The IT organization cares about … things … [such as] traditional reporting [and] information-driven applications. Those are apps that are traditionally delivered at large scale and they have to rely on data that’s trusted, that’s modeled.”

If or when users “need more,” they can “step up” to MicroStrategy’s on-premises (Enterprise Analytics) or cloud (Express) offerings, Spurway pointed out. “The IT organization has to support the business users, but they also need to support the operationalization of analytics,” he argued, citing the goal of embedding analytics into the business process. “That can mean a variety of things. It can mean a very simple report or dashboard that’s being delivered every day to a store manager in a Starbucks. They’re not going to need Visual Insight for something like that — they’re not going to need Tableau. They need something that’s simplified for everyday usage.”

MicroStrategy Analytics Powerful

Something More, Something Else

Many in the industry view self-service visual discovery as the culmination of traditional BI.

One popular narrative holds that QlikTech, Tableau, and Spotfire helped establish and popularize visual discovery as an (insurgent) alternative to traditional BI. Spurway sought to turn this view on its head, however: Visual discovery, he claimed, “is a starting point. It draws you in. The key thing that we bring to the table is the capability to bridge the gap between traditional model, single-version-of-the-truth business intelligence and fast, easy, self-service business analytics.”

In Spurway’s view, the usefulness or efficacy of BI technologies shouldn’t be plotted on a linear time-line, e.g., anchored by greenbar reports on the extreme left and culminating in visual discovery on the far right. Visual discovery doesn’t complete or supplant traditional BI, he argued, and it isn’t inconceivable that QlikTech, Tableau, and Spotfire — much like MicroStrategy and all of the other traditional BI powers that now offer visual discovery tools as part of their BI suite — might augment their products with BI-like accoutrements.

Instead of a culmination, Spurway sees a circle — or, better still, a möbius strip: regardless of where you begin with BI, at some point — in a large enough organization — you’re going to traverse the circle or (as with a möbius strip) come out the other side.

There might be something to this. From the perspective of the typical Tableau enthusiast, for example, the expo floor at last year’s Tableau Customer Conference (TCC), held just outside of Washington, D.C. in early September, probably offered a mix of the familiar, the new, and the plumb off-putting. For example, Tableau users tend to take a dim view of traditional BI, to say nothing of the data integration (DI) or middleware plumbing that’s associated with it: “Just let me work already!” is the familiar cry of the Tableau devotee. However, TCC 2013 played host to several old-guard exhibitors — including IBM Corp., Informatica Corp., SyncSort Inc., and Teradata Corp. — as well as upstart players such as WhereScape Inc. and REST connectivity specialist SnapLogic Inc.

These vendors weren’t just exhibiting, either. As a case in point, Informatica and Tableau teamed up at TCC 2013 to trumpet a new “strategic collaboration.” As part of this accord, Informatica promised to certify its PowerCenter Data Virtualization Edition and Informatica Data Services products for use with Tableau. In an on-site interview, Ash Parikh, senior director of emerging technologies with Informatica, anticipated MicroStrategy’s Spurway by arguing that organizations “need something more.” MicroStrategy’s “something more” is traditional BI reporting and analysis; Informatica’s and Tableau’s is visual analytic discovery.

“Traditional business intelligence alone does not cut it. You need something more. The business user is demanding faster access to information that he wants, but [this] information needs to be trustworthy,” Parikh argued. “This doesn’t mean people who have been doing traditional business intelligence have been doing something wrong; it’s just that they have to complement their existing approaches to business intelligence,” he continued, stressing that Tableau needs to complement — and, to some extent, accommodate — enterprise BI, too.

“From a Tableau customer perspective, Tableau is a leader in self-service business intelligence, but Tableau [the company] is very aware of the fact that if they want to become the standard within an enterprise, the reporting standard, they need to be a trusted source of information,” he said.

Among vendor exhibitors at TCC 2013, this term — “trusted information” or some variation — was a surprisingly common refrain. If Tableau wants to be taken seriously as an enterprisewide player, said Rich Dill, a solutions engineer with SnapLogic, it must be able to accommodate the diversity of enterprise applications, services, and information resources. More to the point, Dill maintained, it must do so in a way that comports with corporate governance and regulatory strictures.

“[Tableau is] starting to get into industries where audit trails are an issue. I’ve seen a lot of financial services and healthcare and insurance businesses here [i.e., at TCC] that have to comply with audit trails, auditability, and logging,” he said. In this context, Dill argued, “If you can’t justify in your document where that number came from, why should I believe it? The data you’re making these decisions on came from these sources, but are these sources trusted?”

Mark Budzinski, vice president and general manager with WhereScape, offered a similar — and, to be sure, similarly self-serving — assessment. Tableau, he argued, has “grown their business by appealing to the frustrated business user who’s hungry for data and analytics anyway they can get it,” he said, citing Tableau’s pioneering use of data blending, which he said “isn’t workable [as a basis for decision-making] across the enterprise. You’re blending data from all of these sources, and before you know it, the problem that the data’s not managed in the proper place starts to rear its ugly head.”

Budzinski’s and WhereScape’s pitch — like those of IBM and Teradata — had a traditional DM angle. “There’s no notion of historical data in these blends and there’s no consistency: you’re embedding business rules at the desktop, [but] who’s to say that this rule is the same as the [rule used by the] guy in the next unit. How do you ensure integrity of the data and [ensure that] the right decisions were made? The only way to do that is in some data warehouse-, data mart-[like] thing.”

Stephen Swoyer can be reached at stephen.swoyer@spinkle.net.