Automating Enrichment Jobs


First you need to make sure you have the Analytics package installed. If you aren't sure try running this:

    pip install demyst-analytics

Test Data

First, let's create some test data to use in this example. In an IPython environment or in a Python script, execute this code:

    import pandas as pd
    test_df = pd.DataFrame({'email_address': ['', '']})
    test_df.to_dense().to_csv("inputs.csv", index = False, sep=',', encoding='utf-8')

You should end up with a file called inputs.csv that looks like this:



Now that we have some test data, let's build a script to enrich our input file using the Demyst platform. For purposes of this test we are going to be using the domain_from_email data product, which is a test product Demyst offers that simply splits up email_address columns sent to it.

Let's start by importing the necessary packages.

    import pandas as pd
    from import Analytics

You will need a production API Key from the Demyst Console.

analytics = Analytics(key='XXXXXX')

If you don't have an API Key yet, you can test using your Username and Password by leaving out the key parameter.

analytics = Analytics()

Now let's read in our inputs file. Because our CSV file has a header that is understood by the Demyst platform email_address, the file can be used as a dataframe without modification.

inputs = pd.read_csv('inputs.csv')

To enrich the file, we pass the list of providers along with the input dataframe to the enrich function.

job_id = analytics.enrich(['domain_from_email'], inputs, validate=False)

The enrich_download function will block until the job is complete and return a dataframe:

outputs = analytics.enrich_download(job_id)

Lastly, we can take the resulting ouput dataframe, and write it to a file.

outputs.to_dense().to_csv('outputs.csv', index = False, sep=',', encoding='utf-8')

The output of this script will be a file called outputs.csv which should look like this:


This output could be for the next stage of ETL pipeline or it could be imported into a modeling tool.

The full solution is provided below. If you need help automating a production job, don't hesitate to reach out to

    import pandas as pd
    from import Analytics

    analytics = Analytics()

    inputs = pd.read_csv('inputs.csv')

    job_id = analytics.enrich(['domain_from_email'], 

    outputs = analytics.enrich_download(job_id)

                            index = False,