Tag Archives: using

Using the R Integration functionality, how to perform Text Mining on a MicroStrategy report and display the result

Readers:Jaime Perez

Here is another great post in the MicroStrategy Community from Jaime Perez (photo, right) and his team. A lot of work when into the preparation of this post and it shows some great ways to use the “R” integration with MicroStrategy.

Contributors from Jaime’s team include:

ssonobe  ssonobe

Lili 

Joanne A 

Ohingst  Ohingst

Enjoy!

Michael

Text Mining Using R Integration in MicroStrategy

Users may wish to perform text mining using R on the result of any arbitrary MicroStrategy report and display the result. One of the problems that hinders the users from achieving it is that the number of output elements is not always consistent. For example, a report may have three attributes named ‘Age groups’, ‘Reviewer’, and ‘Survey feedback’ and the report might display four rows of feedback as follows:

01.jpg

If the above report result is sent to R as an input and the R script breaks down each sentence of the feedback into the term frequency that is grouped by the age groups, it will have 18 rows.

02.jpg

Since the number of output elements is greater than the number of the MicroStrategy report rows, the report execution will fail. Using the objects in the Tutorial project, this technical note (TN207734) describes one way to display the result of text mining on a MicroStrategy report, using the R integration functionality.

PREMISE:
– Following the instructions in TN43665, the MicroStrategy R Integration Pack has already been installed on the Intelligence Server.

The Steps Involved

STEP 1: Decide on the input values that need to be sent to R via R metrics
The first step is to decide on which data you wish to perform text mining. In this technical note, the sample report will let users select one year element, the arbitrary number of category elements, and specify the Revenue amount in prompts. The report will then display the value of the normalized TF-IDF (term frequency and inverse document frequency) for every word showing up in the qualified Item attribute elements, grouped by the Category elements.

A user may select the following values for each prompt and the report may look as shown below.

  • Year: 2012
  • Category: Books, Movies, and Music
  • Revenue: greater than $15,000

03.jpg

Eventually, the user may want to see the normalized TF-IDF for every word showing up in the Item attribute elements as shown below:

04.jpg

Since the final output displays each word from the Item attribute and it is grouped by the Category elements, the necessary input values to R are as follows

  • The elements of the Category attribute.
  • The elements of the Item attribute.

 

STEP 2: Create metrics to pass the input values to R

The input values to R from MicroStrategy must be passed via metrics. Hence, on top of the current grid objects, additional metrics need to be created. For this sample report, since the inputs are the elements of two attributes, create two metrics with the following definitions so that the elements are displayed as metrics.

Max(Category@DESC) {~}

Max(Item@DESC) {~}

05.jpg

 

STEP 3: R script – Phase 1: Define input and output variables and write R script to obtain what you wish to display in a MicroStrategy report

In the R script, define (1) a variable that receives the inputs from MicroStrategy and (2) a variable that will be sent back to MicroStrategy as the output as depicted below. Since the number of output elements must match with the number of input elements, it is defined as “output = mstrInput2” to avoid the errors. In other words, this script executes R functions to obtain the data that you wish to display in a MicroStrategy report, but the output is the same as the input. More details about how to display the result in a MicroStrategy report will be followed up later in this technical note.

06.jpg

 

In this technical note, after manipulating the input value, we assume that the variable named ‘norm.TF.IDF’ in the R script holds the values of the TF-IDF for each term.

07.jpg

 

STEP 4: Create tables in the data warehouse to store the value of your R output

In order to display the values of the ‘norm.TF.IDF’ defined in a MicroStrategy report, tables to hold the result need to be created in the data warehouse. In other words, additional report will later have to be created in MicroStrategy and it will extract the data from the database tables, which are created in this section.

In this specific example, the variable ‘norm.TF.IDF’ has the elements of words (terms) and categories and the values of the normalized TF-IDF. Considering the types of data, the first two should be displayed as attributes and the values of the normalized TF-IDF should be presented in a metric. Hence, two lookup tables to hold the term and category elements and one fact table need to be created to store all the data. On top of these tables, one relationship table is also required since the relationship between words and categories is many-to-many.

 

STEP 5: R script – Phase 2: Populate the tables in your R script

As previously mentioned, the variable named ‘norm.TF.IDF’ contains the values, which a user wishes to display in a MicroStrategy report as shown below.

07.jpg

 

In this R script, four more variables are defined from ‘norm.TF.IDF’, each of which contains the subset of data that will be inserted into the database tables.

 

tm_Category holds the unique elements of the Category.

10.jpg

 

tm_Word holds the unique elements of the Word (Term).

11.jpg

 

tm_Word_Cat stores the values of the many-to-many relationship.

12.jpg

 

tm_Fact contains the values of TF-IDF for every Word-Category combination.

13.jpg

 

In the R script, populate the database tables with the above four subsets of ‘norm.TF.IDF’.

# Load RODBC
library(RODBC)

# RODBC package: assign ch the connectivity information
ch <- odbcConnect("DSN_name")

# Delete all the rows of the tables
sqlClear(ch, "tm_Category", errors = TRUE)
sqlClear(ch, "tm_Word",     errors = TRUE)
sqlClear(ch, "tm_Word_Cat", errors = TRUE)
sqlClear(ch, "tm_Fact",     errors = TRUE)

# SQL: insert the data into tables; use parameterized query
sqlSave(ch, tm_Category, tablename = "tm_Category", rownames=FALSE, append=TRUE, fast = TRUE)
sqlSave(ch, tm_Word,  tablename = "tm_Word", rownames=FALSE, append=TRUE, fast = TRUE)
sqlSave(ch, tm_Word_Cat, tablename = "tm_Word_Cat", rownames=FALSE, append=TRUE, fast = TRUE)
sqlSave(ch, tm_Fact, tablename = "tm_Fact", rownames=FALSE, append=TRUE, fast = TRUE)

#Close the channel
odbcClose(ch)

 

STEP 6: Create and add an R metric, which implements the R script

The R script is done. It is time to implement this R script from MicroStrategy by creating an R script. In the deployR interface, open the R script and define the input and output that you specify in Step 3 as follows. Since the elements of the Category and Item attributes are characters, choose “String” as its data type. Likewise, since the output is the same as the mstrInput2, its data type is also set to string.

14.jpg

 

Create a stand-alone metric and paste the metric definition of the deployR utility. Then, replace the last parameters by the Category and Item metrics that you created in Step 2.

15.jpg

 

Add the R metric to the report.

15.2.png

 

The report and R will perform the following actions after adding the R metric
i. The report lets users select the prompt answers
ii. MicroStrategy sends the Category and Item elements to R via the R metric
iii. R performs text mining to calculate the TF-IDF based on the inputs
iv. R generates subsets of the TF-IDF
v. R truncates the database tables and populates them with the subset of the TF-IDF
vi. R sends the output(which is actuary the input) to MicroStrategy
vii. The report displays the values of all object including the R metric

 

STEP 7: Create MicroStrategy objects to display the data

From the tables created in Step 4, create the Word and Category attributes and the fact named weight. The object relationship is as depicted below.

08.jpg

09.jpg

 

Now, create a new report with these objects. This report will obtain and display the data from the database tables.

16.jpg

 

STEP 8: Utilize the report level VLDB properties to manipulate the order of the report execution jobs

There are currently two reports and let each of which to be named R1 and R2 as described below

  • R1: A report which prompts users to specify the report requirements and implements the R script executing text mining
  • R2: This report obtains the result of text mining from the database and display it

 

If the two reports are placed in a document as datasets as shown below, there is one problem: R2 may start its execution before R1 populates the database tables with the result of text mining.

17.jpg

 

In order to force R2 to execute its job after the completion of R1, the VLDB properties PRE/POST statements along with additional database table may be used. The table tm_Flag contains the value of 0 or 1. R2 is triggered when R1 sets the value of completeFlag to 1. The detailed steps are described below with the script for SQL Server.

 

i. Create another table in the database, which holds the value of 1 or 0

CREATE TABLE tm_Flag
(
   completeFlag int
)


INSERT INTO tm_Flag VALUES(0)

 

ii. In the VLDB property ‘Report Post Statement 1” of the R1 report, defines a Transact-SQL statement that changes the value of completeFlag to the value of 1.

DECLARE @query as nvarchar(100)
SET @query = 'UPDATE tm_Flag SET completeFlag = 1'
EXEC sp_executesql @query

 

iii. Define the VLDB property ‘Report Pre Statement 1’ in R2 so that it will check the value of completeFlag every second and loop until it turns to 1. After the loop, it will revert the value of completeFlag back to 0. After this Report Pre Statement, R2 will obtain data from the database, which has been populated by R1.

DECLARE @intFlag INT
SET @intFlag = (select max(completeFlag) from tm_Flag)

WHILE(@intFlag = 0)
BEGIN
	WAITFOR DELAY '00:00:01'
	SET @intFlag = (select max(completeFlag) from tm_Flag)
END

DECLARE @query as nvarchar(100)
SET @query = 'UPDATE tm_Flag SET completeFlag = 0'
EXEC sp_executesql @query

 

Activity Diagram

18_revised.png

 

 

Overall execution flow

  1. Answer prompts

19.png

 

2. Only the text mining result is displayed to users

20.png

 

Third Party Software Installation:

WARNING: The third-party product(s) discussed in this technical note is manufactured by vendors independent of MicroStrategy. MicroStrategy makes no warranty, express, implied or otherwise, regarding this product, including its performance or reliability.

 

Bryan Brandow: Triggering Cubes & Extracts using Tableau or MicroStrategy

trigger-720x340

bryan-headshots-004Bryan Brandow (photo, right), a Data Engineering Manager for a large social media company, is one of my favorite bloggers out their in regards to thought leadership and digging deep into the technical aspects of Tableau and MicroStrategy. Bryan just blogged about triggering cubes and extracts on his blog. Here is a brief synopsis.

One of the functions that never seems to be included in BI tools is an easy way to kick off an application cache job once your ETL is finished. MicroStrategy’s Cubes and Tableau’s Extracts both rely on manual or time based refresh schedules, but this leaves you in a position where your data will land in the database and you’ll either have a large gap before the dashboard is updated or you’ll be refreshing constantly and wasting lots of system resources. They both come with command line tools for kicking off a refresh, but then it’s up to you to figure out how to link your ETL jobs to call these commands. What follows is a solution that works in my environment and will probably work for yours as well. There are of course a lot of ways for your ETL tool to tell your BI tool that it’s time to refresh a cache, but this is my take on it. You won’t find a download-and-install software package here since everyone’s environment is different, but you will find ample blueprints and examples for how to build your own for your platform and for whatever BI tool you use (from what I’ve observed, this setup is fairly common). Trigger was first demoed at the Tableau Conference 2014. You can jump to the Trigger demo here.

I recommend you click on the link above and give his blog post a full read. It is well worth it.

Best regards,

Michael

Data Blending: Why are Some Metric Values Blank in Documents Using Multiple Datasets in MicroStrategy Analytics Enterprise 9.4.1 (Part 8)

MicroStrategy Analytics Enterprise

Introduction

Starting with MicroStrategy Analytics Enterprise 9.4.1, Report Services documents can contain grids with objects coming from more than one dataset.

Multiple Datasets in a Single Grid/Graph/Widget Object in MicroStrategy Web 9.4 [2]

Users now have the ability to add attributes and/or metrics from multiple datasets to a single grid, graph, or widget. For example, if Dataset #1 contains Category and Revenue and Dataset #2 contains Category and Profit, a grid can be created which contains Category, Revenue, and Profit.

Part 8 - 1

Administrators can control the use of multiple datasets in a single grid, graph, or widget through the Analytical Engine VLDB properties window at the project level.

  1. Right mouse click (RMC) on the project name.
  2. Select Project Configuration.
  3. Click on Project Definition.
  4. Select ‘Advanced’.
  5. Click “Configure” under the Analytical engine VLDB.

Part 8 - 2

NOTE:

The default value is set to: “Objects in document grids must come from the grid’s source dataset only”.

Users can set the set the source of the grid to a particular dataset or choose no dataset (in which case, the MicroStrategy engine will determine the best suited dataset). [1]

The MicroStrategy Analytical Engine displays no data for metrics in ambiguous cases or when there is a conflict. Ambiguous cases can arise in cases where multiple datasets contain the same objects.  Examples based on the MicroStrategy Tutorial project have been provided to explain this information.

Note: When the MicroStrategy Analytical Engine cannot resolve the correct datatset as explained in the cases below, the data displayed for these will correspond to the value chosen for the missing object display under Project Configuration > Report definition > Null values > Missing Object Display. The default value for this blank.

Case1:

Multiple datasets have the same metric. Only one dataset does not contain this metric and this dataset is set as the source of the grid.

This case is explained with an example based on the MicroStrategy Tutorial project.

1. Create the following objects:

a. Dataset DS1 with the attribute ‘Year’ and metric ‘Profit’.

b. Dataset DS2 with the attribute ‘Year’ and metrics ‘Profit’, ‘Revenue’.

c. Dataset DS3 with the attribute ‘Quarter’ and metric ‘Cost’.

2.  Create a document based on the above datasets and create a grid object on the document with the following objects: ‘Year’, ‘Quarter’, ‘Profit’. Set the source of this grid to be the dataset ‘DS3’.

3. In the executed document, no data is displayed for the metric ‘Profit’ as shown below.

Part 8 - 3

In the above example, the metric ‘Profit’ does not exist in the source dataset ‘DS3’ and exists in more than dataset which are in the document i.e., it exists in both ‘DS1’ and ‘DS2’. Since the engine cannot just randomly pick one of the two available datasets, it chooses not to display any data for this metric. If users do not want such blank columns to be displayed, set the source dataset so that such ambiguity does not arise.

Case 2:

The same metric exists multiple times on the grid. For example, users can have a smart compound metric and a component metric of this compound smart metric on the grid in the document. The smart metric and the component metric are from different datasets.

This case is also explained with an example based on the MicroStrategy Tutorial project.

1. Create the following objects:

a. Dataset DS1 with attribute ‘Year’ and metric ‘Profit’.

b. Dataset DS2 with attribute ‘Year’ and metrics ‘Revenue’, ‘Profit’, ‘Profit Margin’ (this is a compound smart metric calculated from metrics Revenue and Profit).

2. Create a document based on the above datasets and create a grid object on the document with the following objects: ‘Year’, ‘Revenue’, ‘Profit’ and ‘Profit Margin’. The source of this grid object is set to DS1.

3. In the executed document, no data is displayed for the metric ‘Profit Margin’, as shown below.

Part 8 - 4

In the above example, since the source of the dataset is set to ‘DS1’, the ‘Profit’ metric is sourced from this dataset and the metric ‘Revenue’ is sourced from the dataset ‘DS2’ (as this is the ONLY datatset with this metric). However, for the metric ‘Profit Margin’, the component metric ‘Profit’ exists on dataset ‘DS1’, so this becomes a conflict metric and is not displayed. If the source of the grid is changed to ‘DS2’, the data is displayed correctly as shown below.

Part 8 - 5

References:

[1] MicroStrategy Knowledgebase, Why are some metric values blank in documents using multiple datasets in MicroStrategy Analytics Enterprise 9.4.1, TN Key: 44517, 12/16/2013, https://resource.microstrategy.com/support/mainsearch.aspx.

[2] MicroStrategy Knowledgebase, Multiple datasets in a single grid/graph/widget object in MicroStrategy Web 9.4, TN Key: 44944, 09/30/2013, https://resource.microstrategy.com/support/mainsearch.aspx.

NOTE: You may need to register to view MicroStrategy’s Knowledgebase.

Interview Question #5: Analysis Using Enterprise Manager

Question

If you want to analyze the average length of time users must wait for their documents or reports to process, as well as the number of errors that were received in Enterprise Manager, which area of analysis would enable you to track this information?

A.   Performance Analysis

B.   Operations Analysis

C.   Real-Time Analysis

D.   Project Analysis

E.   You cannot accomplish this with Enterprise Manager

Answer

B. Operations Analysis

The Operations Analysis folder in Enterprise Manager contains the following analysis areas, each with its own reports:

  • Concurrency analysis (including user/session analysis)
  • Data load
  • Delivery processing analysis
  • Inbox Message Analysis
  • Report processing analysis
  • Resource utilization analysis (including top consumers)

MicroStrategy Course Where You Will Learn About This Topic

MicroStrategy Administration: Application Management Course

An Introduction to Data Blending – Part 6 (Data Blending using MicroStrategy)

Readers:

In Part 5 of this series on data blending, we reviewed Tableau’s Data Blending Architecture. With Part 5, I have wrapped up the Tableau portion of this series.

I am now going to post, over the next week or so, several parts discussing how we do data blending using MicroStrategy. Fortunately, MicroStrategy just publish a nice technical note on their Knowledgebase (TN Key: 46940) [1] discussing this. Most of what I am sharing today is derived from that technical note.

I probably will have 2-4 parts for this topic in my Data Blending series including how the MicroStrategy Analytical Engine deals with multiple datasets.

I want to thank Kristi Morton (et al) for the wonderful research paper she wrote at The University of Washington [2]. It helped me provide some real insight into the topic and mechanics of data blending, particularly with Tableau. You can learn more about Ms. Morton’s research as well as other resources used to create this blog post by referring to the References at the end of the blog post.

So, let’s now dig into how MicroStrategy provides us data blending capabilities.

Best Regards,

Michael

Data Blending using MicroStrategy

In Part 6, we will begin examining using data blending in MicroStrategy. We will first look at how to use attributes from multiple datasets in the same Visual Insight dashboard and link them to existing attributes using the Data Blend feature in MicroStrategy Analytics Enterprise Web 9.4.1.

Prior to v9.4.1 of MicroStrategy, data blending was referred to as Cube Joining.

In MicroStrategy Analytics Enterprise Web 9.4.1, the new Report Services Documents Engine automatically links common attributes using the modeled schema whenever possible. The manual linking is not allowed between different modeled attributes. Just in case the requirement needs to link different attributes, this can be done by using MicroStrategy Architect at the schema level. The join behavior by default for linking related attributes is done using a full outer join. In case there is no relationship between the attributes, then a cross join is used.

The manual attribute linking can be done as shown in the images below.

Part 6 - 1

 

2. Browse the file to match the existing data and select Continue.

Part 6 - 2

 

3. Set the attribute forms if needed. MicroStrategy will automatically assign the detected ones.

Part 6 - 3

4. The attributes can be mapped manually by selecting Link to Project Attribute.

Part 6 - 4

5. Select the attribute form that matches the desired join:

Part 6 - 5

6. The attribute should appear similar to the ones existing in the schema as shown below.

Part 6 - 6

 

7. Save the recently created dataset.

Part 6 - 7

8. Now there are two cubes used as datasets in the same Visual Insight dashboard, as shown below.

Part 6 - 7a

Automatic Linking

The attributes icons now have a blue link, as shown below. This indicates that MicroStrategy has automatically linked them to elements in the Information dataset.

Part 6 - 8

Next: How Data Blending Affects the Analytical Engine’s Behavior in MicroStrategy

———————————————————————————-

References:

[1] MicroStrategy Knowledgebase, How to use attributes from multiple datasets in the same Visual Insight dashboard and link them to existing attributes using the Data Blend feature in MicroStrategy Analytics Enterprise Web 9.4.1, TN Key: 46940, 04/24/2014, https://resource.microstrategy.com/support/mainsearch.aspx.

NOTE: You may need to register to view MiroStrategy’s Knowledgebase.

[2] Kristi Morton, Ross Bunker, Jock Mackinlay, Robert Morton, and Chris Stolte, Dynamic Workload Driven Data Integration in Tableau, University of Washington and Tableau Software, Seattle, Washington, March 2012, http://homes.cs.washington.edu/~kmorton/modi221-mortonA.pdf.

Tips & Tricks #7: How to Enable or Disable Match Case Sensitivity for Prompts using MicroStrategy Web

Readers:

Believe it or not, this is a question I get asked a lot by clients. Based on their requirements or data, many clients do not want the Match Case Sensitivity check box preference enabled to begin with when they are prompted. Fortunately for us, the solution to this is fairly simple as I show below.

If you have a MicroStrategy question you would like me to answer or blog about, please e-mail it to me (include screenshots if it will help) to michael@dataarchaeology.net. I will do my best to answer your questions.

Best Regards,

Michael

Match Case Sensitivity Preference

When you run certain prompt types in MicroStrategy Web, the interface offers a search option followed by a Match Case Sensitivity check box.

This functionality allows users to search for specific elements within the prompt’s available answers. To allow even more flexibility, the check box allows the user to restrict the search results only to those that match the case entered. In the screenshot below, you can see the check box is enabled (checked) and case sensitivity will be matched.

Tip7-0

However, you can control whether this option is enabled or disabled by default for all prompts by setting the following preference in MicroStrategy Web.

This option is under Preferences, Project Defaults, Prompts.

Don’t forget to press the Apply button for the changes to take place.

Tip7-1a

Now, in the screenshot below, you can see the check box is disabled (unchecked) and case sensitivity will not be matched.

Tip7-2