Attribute Stats on Enriched Data

The various technologies, analytics and business intelligence around data have been buzzwords for some time now. They have also been the main ingredient for enhancing the decision-making process and business value. DemystData provides you with external data, technology to access these data and helps you with the right analytical tools to extract the value out of this data.

Demyst’s Python API allows you to access external data from a large number of data products, around 160 in numbers. Once you have your input dataset enriched with external data, you would like to see the stats around them. You can get this done easily through a single method, report(), from our Python API. The method looks through your input and the enriched response from demyst to provide a list of stats on the products and attributes. Here is a list of some stats you get back:

  • match_rate: Number of matches found by the product for the given input dataset. This helps you in determining the best products for your use case
  • fill_rate: The number of values filled for the attribute. While match_rate will be at the data product level, fill_rate is more on the attribute response level. This gives you an idea of how often the attribute will contain values or how much values are contained in the attribute and will it be enough for processing. fill_rate will always be less than equal to the match_rate.
  • type: The type of the attribute response. It will be one of boolean, object, int or float.
  • nunique: The unique number of values that the attribute contains. Imagine if you saw 52 unique values for the state attribute! Unique values give you an insight on the attribute values and can also aid in data scrubbing if any.

stats - enriched data

Now that you have the statistics, you would want to retain a subset from all the attributes which would be fed to your decision-engine. You can filter the attributes based on these stats through another method, query(). In the image below, you can see the attributes being filtered on three stats, match rate atleast 80% or more, fill_rate of 50% and atleast 2 or more unique values.

stats query


You now have access to selected few quality attributes. These attributes can be evaluated on their relative importance for your use-case using any of the modeling techniques. Or you can customize data the way you want, having the tools and the power to access and manipulate the data from Demyst!

Harshit Singh

Harshit Singh

Share on facebook
Share on twitter
Share on linkedin

More to explore

SME Business Prospecting

Prospecting SMEs within a data-driven model Demyst has helped several data-driven firms crack the code on how to find and rank the biggest and newest businesses within a particular geography, to identify which companies might

Read More »

The Demyst Approach to “Agile Data”

Analogous to the contrast between waterfall and agile methodologies in software development, Agile Data is focused on achieving a minimum viable releasable improvement. Ignore the scrum masters for a moment; what allows organizations to follow

Read More »

Just Click Here

In today’s marketing landscape, the function of lead generation has expanded beyond the traditional mandate of procuring contact details and now requires that companies have an in-depth understanding of their customers’ firmographic data. Just Click

Read More »
Close Menu