We are going to walk through all of the available methods in the Demyst Analytics Python package. This will give you a broad overview of the features and capabilities of the package.

Analytics

The Analytics class drives all of the methods that help you access external data. Generally, you want to instantiate a separate Analytics object for each data study.

Examples

    • Username & Password Authentication
      The best way to get started is to let the toolkit prompt you for your username and password. If you don’t have credentials you can sign up here.
          from demyst.analytics import Analytics
      
          # If you don't pass in any parameters, you will be prompted for username and password.
          analytics = Analytics()
    • Key-based Authorization
      For non-interactive scripts, use the key parameter to pass in your API key.
          from demyst.analytics import Analytics
      
          # Pass in your API key with the key parameter.
          analytics = Analytics(key="XXXXXXXXXXXXXXXXXXX")

More details on Analytics()

    • class Analytics(kwargs***)
      Argument Defaults Notes
      inputs {} Default input DataFrame to use
      region “us” Which of the global edges to use: us, sg, au
      username None If None provided, then prompted
      password None If None provided, then prompted
      sample_mode True Return test data, set to false for live mode
      config_file None Config file that stores these options
      key None For non-interactive use

.input_files

Lists the various sample input entities available that can be filtered or used as it is for search or enrich. Use these input files for sample execution of methods in the package.

Examples

    • List and query sample input files
      Listing the hosted input files and querying them on an attribute
          from demyst.analytics import Analytics
      
          analytics = Analytics()
      
          #This will print the input files available
          analytics.input_files()
      
          #This will print a subset of input (in dataframe format)
          analytics.input_file('us_business_entity', 100, {"post_code" : "94123"})
      
          The resulting dataframe looks like:
      
               city	          post_code    country    street
          0	San Francisco	   94123        US        2953 Baker St
          1	San Francisco	   94123        US        1628 Union St

More details on analytics. input_file()

    • analytics.input_file(_filename, row_limit=None, filters=None_)
      Argument Defaults Notes
      filename None Required,sample input file name
      row_limit None Number of rows required in output
      filters None Attribute/Column header and value to filter on

      Results: Provides filtered input file that can be used for search or enrichment

.load_input

Automatically infers the type of input from the CSV file and provides a cleaner dataframe with Demyst Types that can be used for enrichment. You can load your input file through pandas as well and use validate if you prefer to manually confirm to Demyst Types.

Still in its beta version, use this to get the resulting dataframe that can be used for enrichment, without having to read a file, validate the types and making necessary changes. Following are the list of inputs it can currently detect:

  • First Name
  • Last Name
  • Full Name
  • Street address
  • City
  • State
  • Postcode
  • Country
  • Full Address
  • Business Name
  • Phone number
  • Email address
  • Booleans
  • IP address

Examples

    • Loading a CSV File into Demyst Types
      Loading a CSV file with different column types and getting a dataframe of Demyst Types.
          from demyst.analytics import Analytics
      
              analytics = Analytics()
      
              analytics.load_input("path_to_file.csv")
      
              #This will print the input dataset (in dataframe format)
      
          The resulting dataframe looks like:
      
                first_name	  last_name    email_address
              0	John	        Doe          johndoe@gmail.com
              1	Jane	        Austen       janeausten@gmail.com

. sample_data

Returns sample data for a provider. Also allows to filter on number of rows and values of a column you need.

Examples

    • Show sample data
      This fetches some sample data for the google_latlon provider.
          from demyst.analytics import Analytics
      
          analytics = Analytics()
      
          analytics.sample_data('google_latlon', 5, { "state": "CA" })
      
          #the resulting dataframe looks like
      
                          street                 city    state  post_code  country  good
          9     4910 Castana Ave Apt 8       California    CA      90712      US     1
          0   652 N. Marengo Ave. #202       California    CA      91101      US     1
          1     1303 W 168th St. Apt.9       California    CA      90247      US     1
          2          14666 Hiawatha St       California    CA      91345      US     1
          3       9689 Saint George St       California    CA      91977      US     1
          8       2739 1/2 E Monroe St          Indiana    CA      90810      US     1

More details on analytics.sample_data()

    • analytics.sample_data(provider, row_limit=None, filters=None)
      Argument Defaults Notes
      provider None Required, name of data provider
      row_limit None Number of rows required in output
      filters None Attribute/Column header and value to filter on

      Results: Provides sample data that can will be similar from enrichment

.validate

Checks whether the input dataframe’s column names and values would be accepted by the Demyst system.
You can run this as a quick preflight check before kicking off an enrichment job

Examples

    • Validating CSVs
      For non-interactive scripts, use the key parameter.
          from demyst.analytics import Analytics
      
          analytics = Analytics()
      
          inputs = pd.read_csv('inputs.csv', 
                 dtype = {'phone': object, 'post_code': object})
      
          analytics.validate(inputs)
      
          phone post_code
          0  15555555555     10010

More details on analytics.validate()

    • analytics.validate(inputs, providers=None, notebook=True)
      Argument Defaults Notes
      inputs None Required, unless provided to Analytics()
      providers [] List of Data Products to validate against
      notebook True Produce HTML report, or Boolean if false

      Results: If `notebook` is true, returns an HTML object suitable for Jupyter notebook display. Otherwise returns a boolean indicating whether the validation succeeded.

.search

Looks for providers that are able to return data for the provided inputs. Use this when you have some data and want to see which of our data providers might be able to use it. The headers of input data must be of Demyst types. You can also do an unguided search of products using this method.

Examples

    • Searching providers
      For non-interactive scripts, use the key parameter.
          from demyst.analytics import Analytics
      
          analytics = Analytics()
      
          inputs = pd.read_csv('inputs.csv', dtype = {'phone': object, 'post_code': object})
      
          analytics.search(inputs)
      
          # This will output a nicely-formatted list of providers to the notebook
      
          analytics.search("business")
      
          # This will output providers for business category

More details on analytics.search()

    • analytics.search(inputs=None, tags=None, view=”html”, strict=False)
      Argument Defaults Notes
      inputs None Required, unless provided to Analytics()
      tags None List of tags to search for
      view html Set to “json” to produce JSON output or “dataframe” for a table of products
      strict False If true, only return providers for which all inputs are present

      Results: If view=json, returns a list of result objects, otherwise returns an HTML object suitable for Jupyter notebook display

.attribute_search

Looks for data providers which contain the provided attribute.
If you are looking for a certain attribute and need to know which providers have them, use the attribute_search. It will list all of the providers which contain that attribute in their response.

Examples

    • Searching for an Attribute
      In this example, we will look for the attribute NAICS and which providers can provide me with NAICS (North American Industry Classification System) for the business.
          from demyst.analytics import Analytics
      
          analytics = Analytics()
      
          analytics.attribute_search(name="naics")
      
          #This will print a list (in dataframe format) of the providers
      
      The resulting dataframe containing providers and attribute names looks like:
      
            attribute     provider
          0   naics_codes   experian_business_facts
          1   primary_naics   equifax_austin_tetra_details

.sample_enrich

In its beta version, performs the task done by load_input, search and enrich_and_download so you can see sample response in one step. Takes in a CSV file and:

  • Infers the Demyst Types
  • Searches for matching data products using the inferred types
  • Runs an enrichment with all the searched data products using only the first thirty rows

Examples

    • Sample enrichment of an input file
      With an input file in CSV format, we will see the matching data products and enrichment of first thirty rows from these products.
          from demyst.analytics import Analytics
      
              analytics = Analytics()
      
              analytics.sample_enrich("path_to_file.csv")
      
              #This will print a list (in dataframe format) of inferred input types, matching data products in a nice HTML format and enriched response in dataframe format
      
          The resulting dataframe containing inferred input type looks like:
      
                first_name	  last_name    email_address
              0	John	        Doe          johndoe@gmail.com
              1	Jane	        Austen       janeausten@gmail.com
      
          And the dataframe containing enriched response looks like  
      
                inputs.email_address  domain_from_email.row_id domain_from_email.client_id  \
              0      foo@example.com                         0
              1        test@test.com                         1
      
                domain_from_email.host domain_from_email.user domain_from_email.error
              0            example.com                    foo
              1               test.com                   test

.enrich_and_download

Augments your input data with results from our data providers. This is the main entry point to the Demyst data platform enrich_and_download is actually a convenience wrapper around the more primitive functionality provided by enrich, enrich_wait, and enrich_download. We recommend that you use enrich_and_download to get started, and switch to those other methods later, e.g. when you have lots of data to process.

Examples

    • Enriching an input dataframe
      This example uses enrich_and_download to augment an input dataframe containing some email addresses with our built-in domain_from_email data provider that simply splits the addresses into username and hostname and returns those.
          from demyst.analytics import Analytics
          import pandas as pd
      
          analytics = Analytics()
      
          inputs = pd.DataFrame.from_dict([
              { "email_address": "foo@example.com" },
              { "email_address": "test@test.com" }
          ])
      
          # Here we only use a single data provider, but you can pass in
          # any number of data provider names to use.
          results = analytics.enrich_and_download(["domain_from_email"], inputs)
          print(results)
      
      The resulting dataframe looks like this:
      
            inputs.email_address  domain_from_email.row_id domain_from_email.client_id  \
          0      foo@example.com                         0
          1        test@test.com                         1
      
            domain_from_email.host domain_from_email.user domain_from_email.error
          0            example.com                    foo
          1               test.com                   test

      Note that your input column email_address was mirrored back in a prefixed form as `inputs.email_address`.

      The columns starting with `domain_from_email` were added by the data provider. While this example is somewhat contrived, it shows the basic workings of enrichment: you pass in a dataframe and the names of some providers to use, and get back a dataframe containing additional data from the providers

More details on analytics.enrich_and_download()

    • analytics.enrich_and_download(providers, inputs, validate=True, all_updates=False, hosted_input=None)
      Argument Defaults Notes
      providers [] List of provider names to query
      inputs Inputs to pass to providers
      validate True Perform validation before enrichment
      all_updates False Include historical data in results
      hosted_input None Use a sample input file instead of the provided inputs

      Results: Returns the enriched dataframe.

.enrich

enrich is the lower-level (compared to enrich_and_download) workhorse that lets you kick off an enrichment job asynchronously. It immediately returns a job ID, which you can use with our other methods:

  • Manually check the status of the job with enrich_status.
  • Wait for the job to finish with enrich_wait.
  • Download the results with enrich_download. You can even download partial results while the job is still running.

Use enrich for long-running jobs with real data; if you’re just getting started we recommend to use enrich_and_download which runs synchronously and does all of that for you.

Examples

    • Manual control over enrichment
      We’re re-using the example from enrich_and_download, but use `enrich` which doesn’t block the notebook and thus allows us to keep working while the enrichment is in progress.
      
          from demyst.analytics import Analytics
          import pandas as pd
      
          analytics = Analytics()
      
          inputs = pd.DataFrame.from_dict([
              { "email_address": "foo@example.com" },
              { "email_address": "test@test.com" }
          ])
      
          # This kicks off the job... once it prints the job ID you can continue working.
          job_id = analytics.enrich(["domain_from_email"], inputs)
      
          # If you want to inquire about the status of the job, do the following.
          # This will print some status information and return true if the job is finished.
          finished = analytics.enrich_status(job_id)
      
          # You can also wait for the job to finish:
          analytics.enrich_wait(job_id)
      
          # Now we're ready to download the data:
          results = analytics.enrich_download(job_id)

More details on analytics.enrich()

    • analytics.enrich(providers, inputs, validate=True, all_updates=None, hosted_input=None)
      Argument Defaults Notes
      providers [] List of provider names to query
      inputs Inputs to pass to providers
      validate True Perform validation before enrichment
      all_updates False Include historical data in results
      hosted_input None Use a sample input file instead of the provided inputs

      Results: Returns the ID of the started enrichment job.

.enrich_status

enrich_status returns true if an enrichment job created with enrich is complete, false if it’s still running. It also prints some information about job progress.

Examples

See the example for enrich.

More details on analytics.enrich_status()

    • analytics.enrich_status(id)
      Argument Defaults Notes
      id None Job ID from enrich()

      Results: Returns true if the job is complete, false if it’s still running.

.enrich_wait

enrich_wait waits until an enrichment job created with enrich is complete. It’s similar to running enrich_status in an infinite loop.

Examples

See the example for enrich.

More details on analytics.enrich_wait()

    • analytics.enrich_wait(id)
      Argument Defaults Notes
      id None Job ID from enrich()

      Results: None.

.enrich_download

enrich_download downloads the augmented data of an enrichment job created with enrich and returns the resulting dataframe.
By default, enrich_download will wait until the results are complete, but it also lets you download partial results while the job is still running. To do this, pass block_until_complete=False to enrich_download.

Examples

    • Manual control over enrichment
      We’re re-using the example from enrich_and_download, but use `enrich` which doesn’t block the notebook and thus allows us to keep working while the enrichment is in progress. Once the enrichment is done, we use enrich_download to retrieve the results.
      
          from demyst.analytics import Analytics
          import pandas as pd
      
          analytics = Analytics()
      
          inputs = pd.DataFrame.from_dict([
              { "email_address": "foo@example.com" },
              { "email_address": "test@test.com" }
          ])
      
          # This kicks off the job... once it prints the job ID you can continue working.
          job_id = analytics.enrich(["domain_from_email"], inputs)
      
          # If you want to inquire about the status of the job, do the following.
          # This will print some status information and return true if the job is finished.
          finished = analytics.enrich_status(job_id)
      
          # You can also wait for the job to finish:
          analytics.enrich_wait(job_id)
      
          # Now we're ready to download the data:
          results = analytics.enrich_download(job_id)

More details on analytics.enrich_download()

    • analytics.enrich_download(id)
      Argument Defaults Notes
      id None Job ID from enrich()
      block_until_complete True Wait for all providers to finish if True, download partial results otherwise.

      Results: Returns the enriched dataframe.

.enrich_with_hosted_inputs

Like enrich_and_download, but instead of enriching an input dataframe, it uses sample inputs (see input_files).
Like with input_files, the sample input can be filtered and limited to a certain number of rows.

Examples

    • Enrich with sample input files
      We’re re-using the example from enrich_and_download, but use `enrich` which doesn’t block the notebook and thus allows us to keep working while the enrichment is in progress. Once the enrichment is done, we use enrich_download to retrieve the results.
      
          from demyst.analytics import Analytics
      
          analytics = Analytics()
      
          # This prints the enriched result using the `us_business_entity` inputs
          analytics.enrich_with_hosted_inputs(['domain_from_email'], 'us_business_entity')

More details on analytics.enrich_with_hosted_inputs()

    • analytics.enrich_with_hosted_inputs(providers, hosted_input, row_limit=None, filters=None)
      Argument Defaults Notes
      providers None Required list of provider names
      hosted_input True Required sample input file name
      row_limit None Number of rows to use from sample inpu
      filters None Attribute/Column header and value to filter sample input on

      Results: Enriched dataframe

.enrich_download_to_disk

enrich_download_to_disk downloads the augumented data of an enrichment job created with enrich and saves it as a CSV file on disk.

Use this instead of enrich_download if your outputs are very large.

Examples

    • Download enrichment to disk
      We’re re-using the example from enrich, which doesn’t block the notebook and thus allows us to keep working while the enrichment is in progress.
      
          from demyst.analytics import Analytics
          import pandas as pd
      
          analytics = Analytics()
      
          inputs = pd.DataFrame.from_dict([
              { "email_address": "foo@example.com" },
              { "email_address": "test@test.com" }
          ])
      
          # This kicks off the job... once it prints the job ID you can continue working.
          job_id = analytics.enrich(["domain_from_email"], inputs)
      
          # downloads the enriched dataset to output.csv on your disk
          analytics.enrich_download_to_disk(job_id, "output.csv")

More details on analytics.enrich_download_to_disk()

  • analytics.enrich_download_to_disk(id, file_path, overwrite=False, block_until_complete=True)
    Argument Defaults Notes
    id Job ID from enrich()
    file_path None Path of output CSV file.
    overwrite False If true, overwrites the output file if it exists. If false, aborts if file exists.
    block_until_complete True Wait for all providers to finish if True, download partial results otherwise.

.enrich_credits

enrich_credits prints information about the cost of running an enrichment. Use this to see how many credits a job would take before running it. It has the same parameters as enrich.

Examples

    • Getting credit information
      Here we’re re-using the example from enrich, but instead of actually running the job, we just print how many credits it would take.
          from demyst.analytics import Analytics
          import pandas as pd
      
          analytics = Analytics()
      
          inputs = pd.DataFrame.from_dict([
              { "email_address": "foo@example.com" },
              { "email_address": "test@test.com" }
          ])
      
          # Don't actually run the job, just print how many credits it would take.
          print(analytics.enrich_credits(["domain_from_email"], inputs))

More details on analytics.enrich_credits()

  • analytics.enrich_credits(providers, inputs, validate=True)
    Argument Defaults Notes
    providers [] List of provider names to query
    inputs None Inputs to pass to providers
    validate True Perform validation before enrichment

    Results: Returns the number of the credits running the job would cost.

.products

products returns information about each of our data providers as a dataframe.

Examples

    • Listing data providers
      This example shows how to list all or some data providers.
          from demyst.analytics import Analytics
          a = Analytics()
          # You can either get information about all providers...
          a.products()
          # ...or some providers, by specifying their names:
          a.products(["domain_from_email", "email_age"])

More details on analytics.products()

    • analytics.products(product_names)
      Argument Defaults Notes
      provider_names [] A list of product names to return.

      Results: Returns a dataframe with information about data providers.

.product_catalog

product_catalog returns information about the inputs and outputs of a data provider as a dataframe. You can also get this information for all of our data providers.

Examples

    • Getting information about data providers
      This example shows how to list all or some data providers.
          from demyst.analytics import Analytics
          a = Analytics()
          # Call it like this to get info about particular providers...
          a.product_catalog(["domain_from_email", "email_age"])
          # ...or like this to get info about all providers:
          a.product_catalog(all_products=True)

More details on analytics.product_catalog()

    • analytics.product_catalog(provider_names=[], all_products=False)
      Argument Defaults Notes
      provider_names [] A list of product names to return.
      all_products False Set to true if you want info about all available products.

      Results: Returns information about inputs and outputs of providers as a dataframe.

.product_inputs

product_inputs is like product_catalog, but returns only the inputs of data providers.

Examples

    • Getting information about data providers
      This example shows how to get information about the inputs of some data providers.
          from demyst.analytics import Analytics
          a = Analytics()
          a.product_inputs(["domain_from_email", "email_age"])

More details on analytics.product_inputs()

    • analytics_.product_inputs(provider_names=[], all_products=False)
      Argument Defaults Notes
      provider_names [] A list of product names to return.
      all_products False Set to true if you want info about all available products.

      Results: Returns information about inputs of providers as a dataframe.

.product_outputs

product_outputs is like product_catalog, but returns only the outputs of data providers.

Examples

    • Getting information about data providers
      This example shows how to get information about the outputs of some data providers.
          from demyst.analytics import Analytics
          a = Analytics()
          a.product_outputs(["domain_from_email", "email_age"])

More details on analytics.product_outputs()

analytics.product_outputs(provider_names=[], all_products=False)

Argument Defaults Notes
provider_names [] A list of product names to return.
all_products False Set to true if you want info about all available products.

Results: Returns information about outputs of providers as a dataframe.

.product_stats

product_stats accepts an array of data products as an argument and returns a dataframe of performance metrics and metadata for each of those products’ fields

    • Getting performance statistics for three products
      This example shows how to get product stats on each output field for dnb_find_company, housecanary_property_details, and infutor_property_append.

    from demyst.analytics import Analytics

    analytics = Analytics()
    providers = ["dnb_find_company", "housecanary_property_details", "infutor_property_append"]
    stats = analytics.product_stats(providers)
    print(stats)

    # Alternatively you can return results for all providers:
    analytics.product_stats(all_products=True)

The resulting dataframe looks like the following:

             input_entity     stats_updated_on           product  \
    0  us_business_entity  2019-10-04 07:06:37  dnb_find_company   
    1  us_business_entity  2019-10-04 07:06:37  dnb_find_company   
    2  us_business_entity  2019-10-04 06:22:46  dnb_find_company   
    3  us_business_entity  2019-10-04 08:44:23  dnb_find_company   
    4  us_business_entity  2019-10-04 02:43:14  dnb_find_company   
    
       product_error_rate  product_match_rate  \
    0                 0.0            0.540441   
    1                 0.0            0.521989   
    2                 0.0            1.000000   
    3                 0.0            1.000000   
    4                 0.0            1.000000   
    
                                attribute_flattened_name \
    0  find_company_response_detail.candidate_matched...  
    1  find_company_response_detail.candidate_returne...  
    2  find_company_response_detail.find_candidate[0]...  
    3  find_company_response_detail.find_candidate[0]...  
    4  find_company_response_detail.find_candidate[0]...  
    
       attribute_fill_rate  \
    0            0.540441   
    1            0.521989   
    2            1.000000   
    3            1.000000   
    4            1.000000   
    
       attribute_consistency_rate  attribute_unique_values  \
    0                    0.494737                       92   
    1                    0.494737                       23   
    2                    0.000000                       99   
    3                    0.000000                        1   
    4                    0.960000                      261   
    
                                      most_common_values \
    0              {'1': 117, '3': 14, '2': 25, '4': 12}  
    1    {'1': 117, '25': 53, '3': 14, '2': 25, '4': 12}  
    2  {'10': 18, '1': 14, '3': 10, '2': 23, '5': 15,...  
    3                                         {'1': 267}  
    4                                                 {}  
    
                 std  median  \
    0  187614.338032   323.0   
    1       7.353355    12.0   
    2  656121.473294   240.0   
    3       0.000000     1.0   
    4            NaN     NaN   
    
               mean  max_value      variance \
    0  53636.032609   950391.0  3.519914e+10  
    1     12.565217       25.0  5.407183e+01  
    2  96321.010101  6540000.0  4.304954e+11  
    3      1.000000        1.0  0.000000e+00  
    4           NaN        NaN           NaN  
    
       min_value  attribute_onboarded_date  \
    0        1.0  2018-07-26T20:01:22.000Z   
    1        1.0  2018-07-26T20:01:22.000Z   
    2        1.0  2018-07-26T20:01:22.000Z   
    3        1.0  2018-07-26T20:01:22.000Z   
    4        NaN  2018-07-26T20:01:22.000Z   
    
      attribute_audited_date  attribute_pii  \
    0    2019-10-04 07:06:37              0   
    1    2019-10-04 07:06:37              0   
    2    2019-10-04 06:22:46              0   
    3    2019-10-04 08:44:23              0   
    4    2019-10-04 02:43:14              0   
    
                                      attribute_use_case  
    0  Address Verification, Business Contact, Busine...  
    1  Address Verification, Business Contact, Busine...  
    2  Address Verification, Business Contact, Busine...  
    3  Address Verification, Business Contact, Busine...  
    4  Address Verification, Business Contact, Busine...

More details on analytics.product_stats()

    • analytics.product_stats(providers)
      Argument Defaults Notes
      provider_names [] List of provider names to view stats.

      Results: Returns the performance data and metadata of products’ fields.

report

report accepts an input dataframe and the response dataframe from the enriched methods. Report will provide you with statistical data at product and attribute level. Each row will contain the response attribute from enriched methods and various details entailing to the data in the attribute. This includes the type, fill_rate and unique number of values(nunique) in the attribute and on the product level it will include the match_rate.

report accepts an input dataframe and the response dataframe from the enriched methods. Report will provide you with statistical data at product and attribute level. Each row will contain the response attribute from enriched methods and various details entailing to the data in the attribute. This includes the type, fill rate, and unique number of values in the attribute and on the product level it will include the match rate.

    • Getting statistics from the enriched data
      This example shows how to get stats on each attribute for enriched data from seon_email and neutrino_email_verify.
      from demyst.analytics import Analytics
          analytics = Analytics()
      
          inputs = pd.DataFrame.from_dict([
              { "email_address": "foo@example.com" },
              { "email_address": "test@test.com" }
          ])
      
          providers = ["seon_email", "neutrino_email_verify"]
          result = analytics.enrich_and_download(providers, inputs)
          stats = analytics.report(result)
          print(stats.head(5))
      
      The resulting dataframe looks like the following:
      
                      product_name product_match_rate attribute_name  \
          0                 inputs             100.00  email_address   
          1  neutrino_email_verify             100.00      client_id   
          2  neutrino_email_verify             100.00         domain   
          3  neutrino_email_verify             100.00   domain_error   
          4  neutrino_email_verify             100.00  email_address   
          
            attribute_fill_rate attribute_type  unique_values  \
          0              100.00         object              2   
          1                0.00         object              0   
          2              100.00         object              2   
          3              100.00           bool              1   
          4              100.00         object              2   
          
                                     most_common_values  cardinality   \
          0  {"test@test.com": 1, "foo@example.com": 1}       100.00  
          1                                     {"": 2}          nan    
          2           {"example.com": 1, "test.com": 1}       100.00    
          3                                {"false": 2}        50.00   
          4  {"test@test.com": 1, "foo@example.com": 1}       100.00  
          
             std  median  mean max_value  min_value  variance  
          0  NaN     NaN   NaN       NaN        NaN       NaN  
          1  NaN     NaN   NaN       NaN        NaN       NaN  
          2  NaN     NaN   NaN       NaN        NaN       NaN  
          3  0.0     0.0   0.0       0.0        0.0       0.0  
          4  NaN     NaN   NaN       NaN        NaN       NaN

 

Types

At the heart of the Demyst Platform is its type system.
Types are associated with column names. For example, a column named post_code is expected to contain a postal code.

Data Type Description Example
blob Base64-encoded binary data RGVteXN0
business_name The name of a company Demyst Data Ltd.
city The name of a city New York City
country Must be a 2 or 3 character iso code https://en.wikipedia.org/wiki/ISO_3166-1_alpha-3 or https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2 US, AU, SG
domain An internet domain name demyst.com
email_address An email address support@demyst.com
first_name A first name John
full_name A full name John Doe
gender A gender or abbreviation m, male, f, female
ip4 IP address (version 4) 192.168.0.1
last_name A last name Smith
latitude Number between -90.090.0 40.7
longitude Number between -180.0180.0 -73.9
marital_status A marital status or abbreviation m, married, s, single, …
middle_name A middle name Rupert
number A number. Supports integral and decimal numbers of arbitrary size and precision 42
percentage A number between 0.0 and 100.0 99%, 99
phone Country dependent, for US must be 10 digits without leading one or 11 digits with, area code must be valid 917-475-1881
post_code If US 5 or 9 digit postcode, dash or no dash separating. other countries need be non empty 10001
sic_code A Standard Industrial Classification code. 4 digit character string 2024
state If US it must be a valid 2 character state code or state name. Empty otherwise NY, New York
street Non-empty. A street name 100 Main St
string A character string foo
url A Uniform Resource Locator. Starts with http: or https: https://www.demyst.com
us_ein An Employer Identification Number. Dashes and spaces stripped from input by us, must be 9 numeric character string 12-3456789
us_ssn A Social Security Number. Dashes and spaces stripped from input by us, must be 9 numeric character string 078-05-1120
us_ssn4 The last four digits of a Social Security Number 1120
year_month A particular month of a year. In format yyyy-MM 2019-01
year A year 2019
Close Menu