Attribute Stats on Enriched Data

The various technologies, analytics and business intelligence around data have been buzzwords for some time now. They have also been the main ingredient for enhancing the decision-making process and business value. DemystData provides you with external data, technology to access these data and helps you with the right analytical tools to extract the value out of this data.

Demyst’s Python API allows you to access external data from a large number of data products, around 160 in numbers. Once you have your input dataset enriched with external data, you would like to see the stats around them. You can get this done easily through a single method, report(), from our Python API. The method looks through your input and the enriched response from demyst to provide a list of stats on the products and attributes. Here is a list of some stats you get back:

  • match_rate: Number of matches found by the product for the given input dataset. This helps you in determining the best products for your use case
  • fill_rate: The number of values filled for the attribute. While match_rate will be at the data product level, fill_rate is more on the attribute response level. This gives you an idea of how often the attribute will contain values or how much values are contained in the attribute and will it be enough for processing. fill_rate will always be less than equal to the match_rate.
  • type: The type of the attribute response. It will be one of boolean, object, int or float.
  • nunique: The unique number of values that the attribute contains. Imagine if you saw 52 unique values for the state attribute! Unique values give you an insight on the attribute values and can also aid in data scrubbing if any.

stats - enriched data

Now that you have the statistics, you would want to retain a subset from all the attributes which would be fed to your decision-engine. You can filter the attributes based on these stats through another method, query(). In the image below, you can see the attributes being filtered on three stats, match rate atleast 80% or more, fill_rate of 50% and atleast 2 or more unique values.

stats query

 

You now have access to selected few quality attributes. These attributes can be evaluated on their relative importance for your use-case using any of the modeling techniques. Or you can customize data the way you want, having the tools and the power to access and manipulate the data from Demyst!

(Visited 1 times, 1 visits today)
Harshit Singh

Harshit Singh

Share on facebook
Facebook
Share on twitter
Twitter
Share on linkedin
LinkedIn

Get in touch



More to explore

External Data Ethics

There are a wide range of predictive and compliant attributes among the available set for almost every workflow being optimized. But just because they can be used, should they?

Read More »
Close Menu