Automating Enrichment Jobs

Install

First you need to make sure you have the Analytics package installed. If you
aren’t sure try running this:

:::bash
pip install demyst-analytics

Test Data

First, lets create some test data to use in this example. In an IPython environment
or in a Python script, execute this code:

:::python
import pandas as pd
test_df = pd.DataFrame({'email_address': ['test@test.com', 'test2@test.com']})
test_df.to_dense().to_csv("inputs.csv", index = False, sep=',', encoding='utf-8')

You should end up with a file called inputs.csv that looks like this:
email_address
test@test.com
test2@test.com

Automation

Now that we have some test data, let’s build a script to enrich our input file
using the Demyst platform. For purposes of this test we are going to be using
the domain_from_email data product, which is a test product Demyst offers
that simply splits up email_address columns sent to it.

Let’s start by importing the necessary packages.

:::python
import pandas as pd
from demyst.analytics import Analytics

You will need a production API Key from the Demyst Console.
:::python
analytics = Analytics(key='XXXXXX')

If you don’t have an API Key yet, you can test using your Username and Password
by leaving out the key parameter.
:::python
analytics = Analytics()

Now let’s read in our inputs file. Because our CSV file has a header that is understood
by the Demyst platform email_address, the file can be used as a dataframe without
modification.
:::python
inputs = pd.read_csv('inputs.csv')

To enrich the file, we pass the list of providers along with the input dataframe
to the enrich function.
:::python
job_id = analytics.enrich(['domain_from_email'], inputs, validate=False)

The enrich_download function will block until the job is complete and return
a dataframe:
:::python
outputs = analytics.enrich_download(job_id)

Lastly, we can take the resulting ouput dataframe, and write it to a file.
:::python
outputs.to_dense().to_csv('outputs.csv', index = False, sep=',', encoding='utf-8')

The output of this script will be a file called outputs.csv which should look like this:
inputs.email_address,domain_from_email.row_id,domain_from_email.client_id,domain_from_email.host,domain_from_email.user,domain_from_email.error
test@test.com,0,,test.com,test,
test2@test.com,1,,test.com,test2,

This output could be for the next stage of ETL pipeline or it could be imported
into a modeling tool.

The full solution is provided below. If you need help automating a production job,
don’t hesitate to reach out to support@demystdata.com.

:::python
import pandas as pd
from demyst.analytics import Analytics

analytics = Analytics()

inputs = pd.read_csv('inputs.csv')

job_id = analytics.enrich(['domain_from_email'], inputs, validate=False)
outputs = analytics.enrich_download(job_id)

outputs.to_dense().to_csv('outputs.csv', index = False, sep=',', encoding='utf-8')

Close Menu