Geopulse On-Prem Overview

Geopulse On-Prem is on-premise software designed to operate in low-latency, high-volume environments where network latency precludes hosted options. It serves a number of purposes, including:

  • Providing a low-latency index and storage container for real-time queries by location (in Proximity) and by ID (in Audience)
  • Transporting raw location data from your servers to Factual for processing
  • Returning enriched data from Factual to your local infrastructure

There are two different types of implementations:

  • Java Client : Java libraries for lower-level integration
  • HTTP Server : HTTP server that provides a REST interface to Geopulse libraries


Choosing an On-Prem Implementation

We’ve made the on-prem software available in two variants to accommodate the widest range of implementations. Functionally, the HTTP and Java implementations are 99.5% alike: both flavors support Geopulse Audience and Geopulse Proximity. The Java Library is faster, while the HTTP solution offers GET/POST convenience for a modest trade-off in performance. These are all high-performance solutions designed to be queried with absolutely minimal latency at runtime.

Implementation patterns usually follow these guidelines:

  • Java Client: use this variant if your team is comfortable with Java, latency is critical, and the service must incorporated at the lowest level of your infrastructure. The Java library is usually distributed across tens or hundreds of machines, and commonly shares resources with other software, such as an ad-server.
  • HTTP Server: use this variant for the easiest integration. The HTTP variant usually runs on a dedicated box, processing GET/POST requests from other machines in the same network.


Using Geopulse On-Prem

Geopulse On-Prem’s primary function is to serve as a low-latency look-up to test for ‘hits’ against Audience Sets (whether a device ID fall within a specific audience you created) or Proximity Sets (whether a geographic coordinate falls within a collection of geofences you have created). ‘Hit Testing’ is done at runtime; if there is a ‘hit’, On-Prem returns the device- or coordinate-membership and other data. Here’s a quick overview of hit-testing:

1. Test a Device ID against active Audience Sets. Returns a JSON array of matching sets:

http://[server]/geopulse/audience/sets?user-id=[user-id]


2. Test a Device ID for membership in an active Audience Set. Returns JSON-encoded true if user [user-id] is a member of [set-id], otherwise false:

http://[server]/geopulse/audience/sets/:set-id?user-id=[user-id]


3. Test a coordinate against active Proximity Sets. Returns JSON-encoded map of matching Sets:

http://[server]/geopulse/proximity/indices?latitude=[latitude]&longitude=[longitude]


Detailed call documentation is provided in the respective Java Client and HTTP Server flavors of the product. Details of the match response packet are provided below.


Network Connectivity & Telemetry

Geopulse On-Prem requires a connection to the Internet to operate correctly. This connection feature allows Factual to deliver Sets to the On-Prem machines, and provides Telemetry logging back to Factual for debugging and monitoring.


Set Delivery Over HTTPS

Geopulse Sets — the compiled binary filters for Audience and Proximity — must be transferred to your server for local, low-latency querying.

Once created on Factual’s cluster, sets are stored on Factual’s S3 bucket, Northern California, us-west-2. Geopulse On-Prem requires access to this S3 bucket plus a Factual endpoint that provides updated configurations. In total, these are:

  • https://resources.geopulse.factual.s3.amazonaws.com
  • https://citadel.factual.com (< v2.0 HTTP, 4.0 Java lib), or
  • https://api.factual.com (>= v2.0 HTTP, 4.0 Java lib)

The pulling and connection are handled automatically by the software. The connection and data transfer is via https.


Security

During normal operation, the Proximity software establishes outbound HTTPS connections to: https://api.factual.com, which tells the client which data files are available for download, and a subdomain of https://amazonaws.com, which serves the actual data files. Connections to both services are over HTTPS, which provides two guarantees:

  • A malicious actor who controls one or more routers between the client and server, or who has hijacked client-side DNS, cannot impersonate Factual or Amazon Web Services (no man-in-the-middle attack).
  • An actor who controls any intermediate router(s) will be unable to read the traffic between client and server (no eavesdropping).

The data files downloaded by the client are stored in the file system in a non-executable context (no execute bit set), by a user account that does not have permission to write to system directories. The Proximity software has no capability to run arbitrary code, by design. The only operation it can perform on the downloaded files is to read them into memory, and use that data to perform location-to-match lookups.


Telemetry Over HTTPS

The network connection further provides Factual with server telemetry — basic information about the health and running of the server.

Factual logs the following data from your instance of Geopulse On-Prem:

  • A uniformly random selection of unannotated points (Proximity) or identifiers (Audience)
  • A selection of latencies
  • Requests per second sampled over 60 seconds
  • A selection or responses
  • Responses to status pings
  • A complete list of loaded Sets
  • Version, timestamp, and running process

Telemetry is delivered encrypted over https. Once delivered, the telemetry is used to identify potential issues in the definition or deployment of an index, and to provide context for Factual engineers when debugging an issue. Telemetry data is deleted six months after collection.

A sample telemetry packet is available here for review.


Memory and Performance

Each instance of Geopulse On-Prem must have enough memory allocated to store all indicies. On the Proximity side, index size is proportional to the area it covers. For example, a Proximity index which describes a 500m radius around every bar and nightclub in the US requires 400MB. Audience Sets have lighter data requirements at about 100MB per 100MM users. For a more detailed discussion of memory management, please consult our Memory Management Best Practices Guide.

Performance varies on hardware and the total number of device IDs across all Audience Sets. Note that two sets which contain the same users will still require twice the memory.

You can expect an average response time of <1ms and 20k QPS on a standard four-core server (50k qps using Java Client version).

For optimal HTTP Server performance, we discuss several common implementation patterns here.


Logs

Geopulse On-Prem logs to /var/log/factual/outpost/outpost.log by default. It logs a JSON-encoded digest of requests every 10 seconds. Per the init script, logs are rotated daily and can be cleaned up. Under normal conditions it should only be about 1MB a day. The most recent request digest can be fetched from /zz/status. See Server Status for an example of what you should expect to see on that page.


Error Codes

  • CANNOT_CONNECT – On-Prem cannot fetch new Audience and/or Proximity sets. On-Prem will still serve matches, but will be unable to match against the latest Audience/Proximity sets.
  • INSUFFICIENT_MEMORY – The memory used to store Audience and/or Proximity Sets has exceeded what was specified on initialization.
  • NOT_CONFIGURED – The given organization is not registered as a consumer for the given client.

The ProximityClient throws an InvalidLocationException when the lat/long passed is not valid.


Data Pixel

Factual’s pixel-based feature assists Geopulse users with reporting and data input. It complements similar approaches in the industry. Geopulse On-Prem provides the pixel URL within the Match object.

The pixel addresses two specific use cases:

  1. Tracking the use of Factual Geopulse data is ads-served.
  2. Collecting data for Geopulse Audience creation.

Details of implementation are provided in the Factual Data Pixel documentation.


Match Response

Although Geopulse Proximity and Geopulse Audience are two different products, with their own specific use cases, Geopulse On-Prem powers each, or both concurrently. To facilitate integration and onboarding, we’ve designed the response packets to be identical:

Attribute JSON Field Description
Design Name designName Name of the Geopulse Proximity or Audience Design, assigned in the Designer UI.
Design ID designId Factual generated unique ID for the Design.
Build ID buildId Factual generated unique ID for the built snapshot of data for the Design.
Deployment ID deploymentId Factual generated unique ID for the associated deploy of the Design.
Deployment Tags deploymentTags String tags associated with the deploy of the Design as assigned.
Set ID setId Factual generated unique ID for the Set within the Design that matched.
Targeting Code targetingCode A string assigned to the Set when designing the Design in the Designer UI. (In deprecated versions this was formerly called the “Group ID”.)
Data Source dataSource Factual provided ID for the associated audience data source. (Geopulse Audience only)
Price price CPM price, in cents, for the match. Prices for different creative type and rate codes could be specified when fetching.
Data Pixel URL Prefix dataPixelUrlPrefix Factual generated URL prefix to implement Pixel assisted features. More parameters are required to be appended to this prefix to form the final Data Pixel URL. Read more on the Data Pixel here.
Metadata metadata A map of keys and values describing the matched set within the Design.
Payload payload JSON payload describing the matched set, available as string or bytes.

See the page on Geopulse Terminology for detailed descriptions of each return value.


Proximity Response Example
[{
  "designName": "BigBox",
  "targetingCode": "Costco",
  "deploymentId": "ABCEHGEEG3q45hasdShe",
  "setId": "xyzHGEEG3q45hasdShe",
  "payload": {},
  "dataPixelUrlPrefix": "https://api.factual.com/geopulse/pixel/RmFjdHVhbCBHZW9wdWxzZSBPbi1QcmVtIEZUV"
}, {
  "designName": "Electronics",
  "targetingCode": "Best Buy",
  "deploymentId": "ABMnOwHGbdcG3q45haoazq",
  "setId": "xyzHGEqa68Dxhasdhe",
  "payload": {}
}]


Audience Response Example
[{
  "designName": "affluent-business-travelers",
  "targetingCode": "affluentBiz",
  "deploymentId": "AyoLwbEHGbdcG3q45haoae",
  "setId": "xyzHlQbiqa353q45ha3h9e",
  "dataPixelUrlPrefix": "https://api.factual.com/geopulse/pixel/RmFjdHVhbCBHZW9wdWxzZSBPbi1QcmVtIEZUV"
}, {
  "designName": "san-diego-moms",
  "targetingCode":"SanDiegoMoms",
  "deploymentId": "ABCEHGbdcGwZqb3q45hao9e8",
  "setId": "xyzHqGExqsa353q45hys5he7"
}]