How To Import And Refine Data in Watson Analytics
If you have been following up on my previous articles like Real-time mobile data analytics and Discovering the business insights using Watson Analytics, I am going to give you better understanding on what variety of data or dataset requirements exists in the platform, and how you can refine the same to generate precise insights using the cognitive capabilities of Watson.
Importing Data Into Watson Analytics
As this series is specifically for beginners, who possibly have no prior experience using any business intelligence tool, it is important to understand how data should be structured, and how the quality of the data impacts the result of the insights.
Before Importing Data
It is essential to understand that excel sheets/databases that we import must have a basic structure that is readable and easy to make sense of. Most of excel sheets used for business purposes have a lot of calculations and formulae working out producing dynamic content. Such sheets also have to format like company name or big header in the first row, and some sub headings followed by actual data that matters.
So having a well-optimized data set will allow you to produce valuable insights. For the best practices, your first row must have important headers like Customer Name, Purchase Item, Discount, Profit, Earnings etc.
Make sure the data has readable headers like for cricket match sample that I talked about in the last article, column header/title should be ‘average bowling speed’ instead of ‘avg_bwl_speed’. Sure, both of them can produce same results, but when you are using NLP technology to create questions in natural language, you will have to ask questions like ‘What is the relationship of avg_bw_speed with Dew_factor‘ instead of much better and natural sounding question like ‘What is the relationship of average bowling speed with Dew factor?’
Ideal excel data sheet should look like the following IPL match data.
You can make these adjustments prior to uploading using shape data feature and smooth out further, using the refine feature after you import the data.
Click on New Data from the Watson Analytics Dashboard to upload the data from options like online storage, DB connects or Local. To upload the excel files in XLS or CSV format, you can choose the option local > Select your file > Click on Import.
Once the import is complete which takes about 1 to 3 minutes, you will the name of the file being assigned to the name of the dataset.
Superstore Sales is my recently added data set, and as you can see it’s already at 75% quality, much better than the average data quality of30-50%. Note that even when you have poor data quality, Watson Analytics can still process and provide you comprehensive insights, but business Analyst strives for higher accuracy, and I will show you exactly is how you achieve it.
Poor datasets have missing values or mixed data types in single columns and overall bad representation of data. You have to use refine data to achieve this, which I will share using
The data quality score shows how suitable your data is for generating forecasts. The score is an average of the data quality score for every field or column in the data set, as determined by missing values, constant values, imbalance, influential categories, outliers, and overall skewness.
To preview you want to identify how you can preview, filter, and customize the data that will be available in your new data set, you can use the Shape Data feature prior to uploading.You can do this by connecting your corporate database and clicking on Shape Before instead of Import.
With this you can not only preview the data available on the table, you can also preview quality score in each column.
Once you identify the data type of the columns, Type Conversions can help you convert data types if needed. With a specific column selected, you can perform many of the same tasks using the Operations tab in the operations, history, details pane, such as Type conversions, and also multiple data cleansing operations. Note that history remains for that particular session to access, so you can undo the effects of any bad operation choice.
After shaping the data, you can upload it Watson Analytics and examine it further with the Refine feature.
Refine Data Features of Watson Analytics
You can rename, delete, move and replace the data set and even sort it using the refine feature of the Watson Analytics. Data can also be enriched by adding calculations and filters in the available data.
The same data quality scores by column are available in refine feature, so you can fix the missing data, or sort the better data and details the fields or respective rows that might downgrade the data quality.
You can add calculations and perform operations on data available. To achieve this click the three dots on your data set tile in the dashboard > Click Refine.
Click on the ‘+’ on the top right of the columns and select calculations.
This will open up a new dialogue for performing the calculation. In following sample data on twitter, I am creating a column for adding a number of Author Favorite Tweet with Retweet numbers.
You can change the name of the column, and perform the calculation. And following is the result of my total engagement calculation in the last column. You can perform a number of calculations using the function drop-down menu on two or more number of columns.
Similar way as calculations you can create data groups for further refinement of data, like in following the example I have grouped the followers from 10 non-English speaking countries.
Technically data groups avail you an option to separate and view data into logical containers using familiar terminology. These data groups can be used as starting points to discover the insights just like the column headers.
You can click on any column headers to create data groups, here in following the example we are estimating reach of twitter profiles based on a number of followers. I changed the default sorting by double clicking the numbers. You can choose custom or equal distribution, and use the already intelligently sorted distribution created by Watson Analytics. Once you save the Data group it will be available for you to use in the discovery dashboard.
Watson Analytics automatically create a hierarchy of data, if it can the relationship between values that have multiple levels. A simple example would be If you have dates, Watson can create and sort by hierarchies like Yea >Month > Day. You can fine tune your results using the pre-set hierarchies, or set up new hierarchies on your own similar to data grouping.
Hierarchy structure offers you the ease to navigate through large amounts of data, allowing you to drill up and down to perform analysis at different levels of granularity. Following is a simple example of how a pre-populated hierarchy is being used in the discovery to finding deep data.
So this is how Watson Analytics offers you a bundle of highly intelligent tools that come in handy for business data analysis. They are so intuitive to use, you can even use it for generating business intelligence even for small or medium businesses.
Watson Analytics is available as a free 30-day trial, and you can use all of these featured mentioned above to test how your business can benefit from the cognitive capabilities of Watson Platform and powerful tools provided by the Watson Analytics Platform.