existence is a particularly special core attribute within Factual’s data. It is a machine-learned numerical value between 0.0 and 1.0 (with values rounded to the nearest tenth) applied to every POI record, and is an indication of how confident we are the POI is real, open, and not a duplicate. Records that are deemed to be bad are set to existence = 0.0. We derive these scores by training ML models using a variety of signals, including social signals such as user checkins and tags.

By providing a range of confidence values, we enable our partners to filter data below a threshold to suit their particular needs. The higher the threshold, the more accurate and precise the data will be, but coverage will be lower (this tends to be the strategy for mapping and display use cases). The lower the threshold, the more comprehensive the data will be, but you will have a higher quantity of bad records (this tends to be the strategy for search and active-user use cases).

Important Considerations

  • Factual will deliver country data in its entirety to our partners (i.e., our data files will include records with existence = 0.0). Therefore, it is strongly recommended that all partners use an existence threshold greater than 0.0 to filter out known closed businesses, duplicates, and junky data.
  • existence is not a probabilistic score; records with 0.9 existence will not necessarily be real, open, and non-duplicates 90% of the time
  • Every country has its own existence model and distribution (e.g., a 0.5 score in US likely is not the same level of confidence as a 0.5 in CA)