Why Data Variety Matters
Data variety is essential for your internal data catalog because increasing the breadth of datasets helps to increase accuracy in your data analysis. It also helps your staff to understand the information that’s currently available out there.
Demyst is your one-stop shop for data variety, and in this post, we will see how to populate data.world with our product catalogs using our favorite tool, python.
How to Get Data Variety in Your Internal Catalog
The first thing we are going to do is see a sample of product catalog available on Demyst and what will be populated into the data.world. To do this, we are using our python API and calling our, you guessed it, product_catalog() method. You can find more information on our catalog methods and data dictionaries by viewing our Demyst Analytics Python package here.
Heading over to data.world, you will need to enable Python as a valid integration and create an API token. And just like our python API, installing it and configuring to use the API token was super easy and each a one-liner.
We now create a dataset using a sample CSV file that gives us the data.world URL for our dataset.
We can now add each product from Demyst as a separate CSV file or upload all the products catalog in a single master CSV file in this dataset. And these few lines can be converted to a script that a user can run one time or periodically to add all metadata from the Demyst product catalog into their own catalog.
Let’s see the dataset we just created
We now have the complete catalog as an external dataset available to us. This becomes beneficial to us for investigating the products in an environment we have already integrated with and sharing it across our organization. But let’s not stop at ingesting the catalog and create a project associated with this dataset.
Creating a project linked with our dataset gives us the opportunity to run queries and create insights that can further be shared and discussed. So let’s create a sample query that finds us all the NAICS attribute and the products from Demyst that provide this attribute in their response.
Lastly, let’s see that query as an insight saved to our project with some visualization using Chart-Builder that provides vega-lite visualizations. This helps in seeing how many provide NAICS codes and how many of those are NAICS description. The visualization might be lite but don’t let us inhibit you from more advanced visualizations.
That was pretty simple, wasn’t it? And you have the ability to rerun the selected few lines to update your dataset periodically and create complex insights into hundreds of attributes from Demyst. You can try these steps on our hosted notebook and contact us at email@example.com to let us know about it.