Grundwerk Digital

API Documentation

API Documentation

Complete API reference for Grundwerk Digital platform integration. Build powerful integrations using our RESTful API.

v1.0REST API
Base URL: https://your-domain.com

Authentication

All API endpoints require authentication using an API key. Include your API key in the request header with each request.

Quick Start

API Key Header
api_key: {{your_api_key}}

Base URL

All API requests should be made to the following base URL:

Production Environment
https://web-production-603a8.up.railway.app

Companies

Manage company records in your database. Create, read, update, and delete company information including business details, contact information, and related metadata.

GET/company_get

Get all companie records.

Required: The following fields are mandatory

company_id

Query Parameters

company_idrequired
uuid

The unique identifier of the company

Example: 00000000-0000-0000-0000-000000000000
workspace_idclient_data
uuid

The workspace identifier

Example: 00000000-0000-0000-0000-000000000000
GET
curl -X GET "https://web-production-603a8.up.railway.app/company_get?company_id=00000000-0000-0000-0000-000000000000&workspace_id=00000000-0000-0000-0000-000000000000"
  -H "api_key: {{your_api_key}}"
  -H "Content-Type: application/json"

Response Parameters

company_name_cleanedstring

Cleaned/standardized version of the company name

Example: Example Company GmbH
company_legal_formstring

Legal form of the company (e.g., GmbH, AG, LLC)

Example: GmbH
b2b_b2cstring

Business model classification

Example: B2B
company_name_imprintstring

Company name as it appears in official imprint/legal notices

Example: Example Company GmbH
company_streetstring

Street name of company address

Example: Hauptstraße
company_street_nrstring

Street number of company address

Example: 123
company_citystring

City where company is located

Example: Berlin
company_zipstring

Postal/ZIP code

Example: 10115
company_regionstring

Region/state where company is located

Example: Berlin
company_countrystring

Country where company is located

Example: Germany
company_tax_nrstring

Tax identification number (Steuernummer)

Example: 12/345/67890
company_vat_nrstring

VAT identification number (Umsatzsteuer-ID)

Example: DE123456789
company_handels_registerstring

Commercial register number (Handelsregisternummer)

Example: HRB 12345
company_employees_researchstring

Employee count from research

Example: 50-100
company_employees_linkedinstring

Employee count from LinkedIn

Example: 75
company_founded_yearstring

Year the company was founded

Example: 2010
company_descriptionstring

Company description/about text

Example: Leading provider of software solutions
company_logo_urlstring

URL to company logo image

Example: https://example.com/logo.png
company_size_linkedinstring

Company size range from LinkedIn

Example: 51-200 employees
company_linkedin_followersstring

Number of LinkedIn followers

Example: 1234
company_tagsarray

Array of tags/categories for the company

Example: ["tech","software","b2b"]
company_sourcesarray

Array of data sources

Example: ["website","linkedin","research"]
db_companies_main_created_atstring

Timestamp when company record was created (ISO 8601 format)

Example: 2024-01-15T10:30:00Z
db_companies_main_updated_atstring

Timestamp when company record was last updated (ISO 8601 format)

Example: 2024-03-20T14:45:00Z
company_namesarray

All company names associated with this company. Always present (empty array if none)

Example: ["Example Company","Example Co.","Example GmbH"]
company_domainsarray

All domain names associated with this company. Always present (empty array if none)

Example: ["example.com","example.de"]
company_emailsarray

All email addresses associated with this company. Always present (empty array if none)

Example: ["info@example.com","contact@example.com"]
company_phonesarray

All phone numbers associated with this company. Always present (empty array if none)

Example: ["+49301234567","+49301234568"]
company_linkedinsarray

All LinkedIn URLs associated with this company. Always present (empty array if none)

Example: ["https://linkedin.com/company/example"]
company_instagramsarray

All Instagram URLs associated with this company. Always present (empty array if none)

Example: ["https://instagram.com/example"]
company_facebooksarray

All Facebook URLs associated with this company. Always present (empty array if none)

Example: ["https://facebook.com/example"]
company_xingsarray

All Xing URLs associated with this company. Always present (empty array if none)

Example: []
company_pinterestsarray

All Pinterest URLs associated with this company. Always present (empty array if none)

Example: []
company_tiktoksarray

All TikTok URLs associated with this company. Always present (empty array if none)

Example: []
company_youtubesarray

All YouTube URLs associated with this company. Always present (empty array if none)

Example: ["https://youtube.com/@example"]
company_twittersarray

All Twitter/X URLs associated with this company. Always present (empty array if none)

Example: ["https://twitter.com/example"]
company_workspace_iduuid

UUID of the workspace connection. null if workspace_id not provided in request OR workspace connection doesn't exist

Example: 00000000-0000-0000-0000-000000000000
company_qualifiedstring

Qualification status in workspace. null if not set or no workspace connection

Example: yes
company_custom_tags_wsarray

Custom workspace-specific tags. null if not set or no workspace connection

Example: ["priority","enterprise"]
db_companies_workspace_created_atstring

Timestamp when workspace connection was created (ISO 8601 format). null if no workspace connection

Example: 2024-01-20T11:00:00Z
db_companies_workspace_updated_atstring

Timestamp when workspace connection was last updated (ISO 8601 format). null if no workspace connection

Example: 2024-03-22T16:30:00Z
Response
{
  "company_name_cleaned": "Example Company GmbH",
  "company_legal_form": "GmbH",
  "b2b_b2c": "B2B",
  "company_name_imprint": "Example Company GmbH",
  "company_street": "Hauptstraße",
  "company_street_nr": "123",
  "company_city": "Berlin",
  "company_zip": "10115",
  "company_region": "Berlin",
  "company_country": "Germany",
  "company_tax_nr": "12/345/67890",
  "company_vat_nr": "DE123456789",
  "company_handels_register": "HRB 12345",
  "company_employees_research": "50-100",
  "company_employees_linkedin": "75",
  "company_founded_year": "2010",
  "company_description": "Leading provider of software solutions",
  "company_logo_url": "https://example.com/logo.png",
  "company_size_linkedin": "51-200 employees",
  "company_linkedin_followers": "1234",
  "company_tags": [
    "tech",
    "software",
    "b2b"
  ],
  "company_sources": [
    "website",
    "linkedin",
    "research"
  ],
  "db_companies_main_created_at": "2024-01-15T10:30:00Z",
  "db_companies_main_updated_at": "2024-03-20T14:45:00Z",
  "company_names": [
    "Example Company",
    "Example Co.",
    "Example GmbH"
  ],
  "company_domains": [
    "example.com",
    "example.de"
  ],
  "company_emails": [
    "info@example.com",
    "contact@example.com"
  ],
  "company_phones": [
    "+49301234567",
    "+49301234568"
  ],
  "company_linkedins": [
    "https://linkedin.com/company/example"
  ],
  "company_instagrams": [
    "https://instagram.com/example"
  ],
  "company_facebooks": [
    "https://facebook.com/example"
  ],
  "company_xings": [],
  "company_pinterests": [],
  "company_tiktoks": [],
  "company_youtubes": [
    "https://youtube.com/@example"
  ],
  "company_twitters": [
    "https://twitter.com/example"
  ],
  "company_workspace_id": "ws-uuid-123",
  "company_qualified": "yes",
  "company_custom_tags_ws": [
    "priority",
    "enterprise"
  ],
  "db_companies_workspace_created_at": "2024-01-20T11:00:00Z",
  "db_companies_workspace_updated_at": "2024-03-22T16:30:00Z"
}

POST/company_lookup

Lookup if Company exists, and push if not exists.

Required: At least one of the following conditions must be met

company_nameORcompany_domainORcompany_linkedin

Request Body

workspace_idclient_data
uuid

The workspace identifier

Example: 00000000-0000-0000-0000-000000000000
company_nameat least 1 required
string

The name of the company to lookup

Example: Acme Corporation
company_domainat least 1 required
string

The domain of the company to lookup

Example: acme.com
company_linkedinat least 1 required
string

The LinkedIn URL of the company to lookup

Example: https://linkedin.com/company/acme
POST
curl -X POST "https://web-production-603a8.up.railway.app/company_lookup"
  -H "api_key: {{your_api_key}}"
  -H "Content-Type: application/json"
  -d '{
  "workspace_id": "00000000-0000-0000-0000-000000000000",
  "company_name": "Acme Corporation",
  "company_domain": "acme.com",
  "company_linkedin": "https://linkedin.com/company/acme"
}'

Response Parameters

company_main_iduuid

UUID of the company in the db_companies_main table. Returns UUID if company found, null if not found.

Example: 00000000-0000-0000-0000-000000000000
company_workspace_iduuid

UUID of the company-workspace connection in db_companies_workspace table. Only populated if workspace_id was provided in request. Returns UUID if workspace connection exists, null if not found.

Example: 00000000-0000-0000-0000-000000000000
errorstring

Error message if lookup failed. null if no error occurred.

Response
{
  "company_main_id": "283f9e1f-89dd-4032-8e02-65acc6856ed1",
  "company_workspace_id": "abc-123-def-456",
  "error": null
}

POST/company_push

Push new company data to database.

Required: At least one of the following conditions must be met

company_nameORcompany_domainORcompany_linkedin

Request Body

workspace_idclient_data
uuid

The workspace identifier

Example: 00000000-0000-0000-0000-000000000000
company_nameat least 1 required
string

Company name

Example: Acme Corporation
company_name_cleanedoptional
string

Cleaned company name

Example: Acme Corporation
company_domainat least 1 required
string

Company domain

Example: acme.com
company_linkedinat least 1 required
string

Company LinkedIn URL

Example: https://linkedin.com/company/acme
company_emailoptional
string

Company email address

Example: info@acme.com
company_phoneoptional
string

Company phone number

Example: +1234567890
company_instagramoptional
string

Company Instagram URL

Example: https://instagram.com/acme
company_facebookoptional
string

Company Facebook URL

Example: https://facebook.com/acme
company_xingoptional
string

Company XING URL

Example: https://xing.com/companies/acme
company_pinterestoptional
string

Company Pinterest URL

Example: https://pinterest.com/acme
company_tiktokoptional
string

Company TikTok URL

Example: https://tiktok.com/@acme
company_youtubeoptional
string

Company YouTube URL

Example: https://youtube.com/@acme
company_twitteroptional
string

Company Twitter/X URL

Example: https://twitter.com/acme
company_legal_formoptional
string

Company legal form

Example: Inc.
b2b_b2coptional
enum: b2b_b2c_type

Business model (B2B or B2C)

Example: B2B
Allowed values:
B2BB2Cboth
company_imprint_nameoptional
string

Company imprint name

Example: Acme Corporation Inc.
company_streetoptional
string

Company street address

Example: Main Street
company_street_nroptional
string

Company street number

Example: 123
company_cityoptional
string

Company city

Example: München
company_zipoptional
string

Company zip code

Example: 10001
company_regionoptional
string

Company region/state

Example: Bayern
company_countryoptional
string

Company country

Example: Germany
company_steuer_nroptional
string

Company tax number

Example: 30/321/50964
company_vat_nroptional
string

Company VAT number

Example: DE123456789
company_register_nroptional
string

Company registration number

Example: HRB 12345
employees_researchoptional
int8

Number of employees from research

Example: 75
employees_linkedinoptional
int8

Number of employees from LinkedIn

Example: 75
company_founded_yearoptional
int8

Year company was founded

Example: 2010
descriptionoptional
string

Company description

Example: Leading provider of innovative solutions
company_logo_urloptional
string

Company logo URL

Example: https://acme.com/logo.png
company_size_linkedinoptional
string

Company size from LinkedIn

Example: 51-200
company_linkedin_followersoptional
int8

Number of LinkedIn followers

Example: 1500
company_tagsoptional
enum: textarray

Company tags

Example: ["technology","saas"]
company_sourcesoptional
enum: data_sourcesarray

Data sources

Example: ["linkedin","website"]
Allowed values:
lushaclayapollonorth_datad7_lead_finderstoreleadsbuild_withsales_navigator
company_qualifiedoptional
enum: pending_boolean

Qualification status

Example: qualified
Allowed values:
qualifiedpendingnot_qualified
company_custom_tags_wsoptional
enum: textarray

Custom workspace tags

Example: ["vip","partner"]
POST
curl -X POST "https://web-production-603a8.up.railway.app/company_push"
  -H "api_key: {{your_api_key}}"
  -H "Content-Type: application/json"
  -d '{
  "workspace_id": "00000000-0000-0000-0000-000000000000",
  "company_name": "Acme Corporation",
  "company_name_cleaned": "Acme Corporation",
  "company_domain": "acme.com",
  "company_linkedin": "https://linkedin.com/company/acme",
  "company_email": "info@acme.com",
  "company_phone": "+1234567890",
  "company_instagram": "https://instagram.com/acme",
  "company_facebook": "https://facebook.com/acme",
  "company_xing": "https://xing.com/companies/acme",
  "company_pinterest": "https://pinterest.com/acme",
  "company_tiktok": "https://tiktok.com/@acme",
  "company_youtube": "https://youtube.com/@acme",
  "company_twitter": "https://twitter.com/acme",
  "company_legal_form": "Inc.",
  "b2b_b2c": "B2B",
  "company_imprint_name": "Acme Corporation Inc.",
  "company_street": "Main Street",
  "company_street_nr": "123",
  "company_city": "München",
  "company_zip": "10001",
  "company_region": "Bayern",
  "company_country": "Germany",
  "company_steuer_nr": "30/321/50964",
  "company_vat_nr": "DE123456789",
  "company_register_nr": "HRB 12345",
  "employees_research": 75,
  "employees_linkedin": 75,
  "company_founded_year": 2010,
  "description": "Leading provider of innovative solutions",
  "company_logo_url": "https://acme.com/logo.png",
  "company_size_linkedin": "51-200",
  "company_linkedin_followers": 1500,
  "company_tags": [
    "technology",
    "saas"
  ],
  "company_sources": [
    "linkedin",
    "website"
  ],
  "company_qualified": "qualified",
  "company_custom_tags_ws": [
    "vip",
    "partner"
  ]
}'

Response Parameters

company_main_iduuid

UUID of the company (either found or newly created). null only if operation failed.

Example: 00000000-0000-0000-0000-000000000000
company_workspace_iduuid

UUID of the company-workspace connection (either found or newly created). Only populated if workspace_id was provided in request. null if workspace operation not performed or failed.

Example: 00000000-0000-0000-0000-000000000000
status_companyenum

Status of the company record operation. Can be "found" (company already existed), "created" (new company created), or null (operation failed).

Example: created
Possible values:
foundcreated
status_company_workspaceenum

Status of the workspace connection operation. Can be "found" (workspace connection already existed), "created" (new workspace connection created), or null (not performed or failed). Only relevant when workspace_id provided in request.

Example: created
Possible values:
foundcreated
errorstring

Error message if any operation failed. null if all operations succeeded.

Response
{
  "company_main_id": "283f9e1f-89dd-4032-8e02-65acc6856ed1",
  "company_workspace_id": "abc-123-def-456",
  "status_company": "created",
  "status_company_workspace": "created",
  "error": null
}

POST/company_push_patch

Update existing company data in database.

Required: At least one of the following conditions must be met

company_nameORcompany_domainORcompany_linkedin

Request Body

workspace_idclient_data
uuid

The workspace identifier

Example: 00000000-0000-0000-0000-000000000000
company_nameat least 1 required
string

Company name

Example: Acme Corporation
company_name_cleanedoptional
string

Cleaned company name

Example: Acme Corporation
company_domainat least 1 required
string

Company domain

Example: acme.com
company_linkedinat least 1 required
string

Company LinkedIn URL

Example: https://linkedin.com/company/acme
company_emailoptional
string

Company email address

Example: info@acme.com
company_phoneoptional
string

Company phone number

Example: +1234567890
company_instagramoptional
string

Company Instagram URL

Example: https://instagram.com/acme
company_facebookoptional
string

Company Facebook URL

Example: https://facebook.com/acme
company_xingoptional
string

Company XING URL

Example: https://xing.com/companies/acme
company_pinterestoptional
string

Company Pinterest URL

Example: https://pinterest.com/acme
company_tiktokoptional
string

Company TikTok URL

Example: https://tiktok.com/@acme
company_youtubeoptional
string

Company YouTube URL

Example: https://youtube.com/@acme
company_twitteroptional
string

Company Twitter/X URL

Example: https://twitter.com/acme
company_legal_formoptional
string

Company legal form

Example: Inc.
b2b_b2coptional
enum: b2b_b2c_type

Business model (B2B or B2C)

Example: B2B
Allowed values:
B2BB2Cboth
company_imprint_nameoptional
string

Company imprint name

Example: Acme Corporation Inc.
company_streetoptional
string

Company street address

Example: Main Street
company_street_nroptional
string

Company street number

Example: 123
company_cityoptional
string

Company city

Example: München
company_zipoptional
string

Company zip code

Example: 10001
company_regionoptional
string

Company region/state

Example: Bayern
company_countryoptional
string

Company country

Example: Germany
company_steuer_nroptional
string

Company tax number

Example: 30/321/50964
company_vat_nroptional
string

Company VAT number

Example: DE123456789
company_register_nroptional
string

Company registration number

Example: HRB 12345
employees_researchoptional
int8

Number of employees from research

Example: 75
employees_linkedinoptional
int8

Number of employees from LinkedIn

Example: 75
company_founded_yearoptional
int8

Year company was founded

Example: 2010
descriptionoptional
string

Company description

Example: Leading provider of innovative solutions
company_logo_urloptional
string

Company logo URL

Example: https://acme.com/logo.png
company_size_linkedinoptional
string

Company size from LinkedIn

Example: 51-200
company_linkedin_followersoptional
int8

Number of LinkedIn followers

Example: 1500
company_tagsoptional
enum: textarray

Company tags

Example: ["technology","saas"]
company_sourcesoptional
enum: data_sourcesarray

Data sources

Example: ["linkedin","website"]
Allowed values:
lushaclayapollonorth_datad7_lead_finderstoreleadsbuild_withsales_navigator
company_qualifiedoptional
enum: pending_boolean

Qualification status

Example: qualified
Allowed values:
qualifiedpendingnot_qualified
company_custom_tags_wsoptional
enum: textarray

Custom workspace tags

Example: ["vip","partner"]
POST
curl -X POST "https://web-production-603a8.up.railway.app/company_push_patch"
  -H "api_key: {{your_api_key}}"
  -H "Content-Type: application/json"
  -d '{
  "workspace_id": "00000000-0000-0000-0000-000000000000",
  "company_name": "Acme Corporation",
  "company_name_cleaned": "Acme Corporation",
  "company_domain": "acme.com",
  "company_linkedin": "https://linkedin.com/company/acme",
  "company_email": "info@acme.com",
  "company_phone": "+1234567890",
  "company_instagram": "https://instagram.com/acme",
  "company_facebook": "https://facebook.com/acme",
  "company_xing": "https://xing.com/companies/acme",
  "company_pinterest": "https://pinterest.com/acme",
  "company_tiktok": "https://tiktok.com/@acme",
  "company_youtube": "https://youtube.com/@acme",
  "company_twitter": "https://twitter.com/acme",
  "company_legal_form": "Inc.",
  "b2b_b2c": "B2B",
  "company_imprint_name": "Acme Corporation Inc.",
  "company_street": "Main Street",
  "company_street_nr": "123",
  "company_city": "München",
  "company_zip": "10001",
  "company_region": "Bayern",
  "company_country": "Germany",
  "company_steuer_nr": "30/321/50964",
  "company_vat_nr": "DE123456789",
  "company_register_nr": "HRB 12345",
  "employees_research": 75,
  "employees_linkedin": 75,
  "company_founded_year": 2010,
  "description": "Leading provider of innovative solutions",
  "company_logo_url": "https://acme.com/logo.png",
  "company_size_linkedin": "51-200",
  "company_linkedin_followers": 1500,
  "company_tags": [
    "technology",
    "saas"
  ],
  "company_sources": [
    "linkedin",
    "website"
  ],
  "company_qualified": "qualified",
  "company_custom_tags_ws": [
    "vip",
    "partner"
  ]
}'

Response Parameters

company_main_iduuid

UUID of the company (found, created, or updated). null only if operation failed.

Example: 00000000-0000-0000-0000-000000000000
company_workspace_iduuid

UUID of the company-workspace connection (found, created, or updated). Only populated if workspace_id was provided in request. null if workspace operation not performed or failed.

Example: 00000000-0000-0000-0000-000000000000
status_companyenum

Status of the company record operation. Can be "created" (new company created), "updated" (existing company updated), or null (operation failed). Note: Unlike /company_push, this endpoint never returns "found" - it always updates if found.

Example: updated
Possible values:
createdupdated
status_company_workspaceenum

Status of the workspace connection operation. Can be "created" (new workspace connection created), "updated" (existing workspace connection updated), or null (not performed or failed). Only relevant when workspace_id provided in request. Note: Always updates if workspace connection exists.

Example: updated
Possible values:
createdupdated
errorstring

Error message if any operation failed. null if all operations succeeded.

Response
{
  "company_main_id": "283f9e1f-89dd-4032-8e02-65acc6856ed1",
  "company_workspace_id": "abc-123-def-456",
  "status_company": "updated",
  "status_company_workspace": "updated",
  "error": null
}

POST/company_delete_fields

Create a new companie record.

Required: The following fields are mandatory

company_id

Request Body

company_idrequired
uuid

The unique identifier of the company

Example: 00000000-0000-0000-0000-000000000000
company_workspace_idclient_data
uuid

The workspace identifier

Example: 00000000-0000-0000-0000-000000000000
company_nameoptional
string

Company name to delete

Example: https://linkedin.com/company/acme
company_domainoptional
string

Company domain to delete

Example: acme.com
company_linkedinoptional
string

Company LinkedIn URL to delete

Example: https://linkedin.com/company/acme
company_emailoptional
string

Company email address to delete

Example: info@acme.com
company_phoneoptional
string

Company phone number to delete

Example: +491234567890
company_instagramoptional
string

Company Instagram URL to delete

Example: https://instagram.com/acme
company_facebookoptional
string

Company Facebook URL to delete

Example: https://facebook.com/acme
company_xingoptional
string

Company XING URL to delete

Example: https://xing.com/companies/acme
company_pinterestoptional
string

Company Pinterest URL to delete

Example: https://pinterest.com/acme
company_tiktokoptional
string

Company TikTok URL to delete

Example: https://tiktok.com/@acme
company_youtubeoptional
string

Company YouTube URL to delete

Example: https://youtube.com/@acme
company_twitteroptional
string

Company Twitter/X URL to delete

Example: https://twitter.com/acme
company_name_cleanedoptional
boolean

Set to true to delete the cleaned company name field, false to keep it

Example: false
Possible values:truefalse
company_legal_formoptional
boolean

Set to true to delete the company legal form field, false to keep it

Example: false
Possible values:truefalse
b2b_b2coptional
boolean

Set to true to delete the business model field, false to keep it

Example: false
Possible values:truefalse
company_imprint_nameoptional
boolean

Set to true to delete the company imprint name field, false to keep it

Example: false
Possible values:truefalse
company_streetoptional
boolean

Set to true to delete the company street field, false to keep it

Example: false
Possible values:truefalse
company_street_nroptional
boolean

Set to true to delete the company street number field, false to keep it

Example: false
Possible values:truefalse
company_cityoptional
boolean

Set to true to delete the company city field, false to keep it

Example: false
Possible values:truefalse
company_zipoptional
boolean

Set to true to delete the company zip code field, false to keep it

Example: false
Possible values:truefalse
company_regionoptional
boolean

Set to true to delete the company region field, false to keep it

Example: false
Possible values:truefalse
company_countryoptional
boolean

Set to true to delete the company country field, false to keep it

Example: false
Possible values:truefalse
company_steuer_nroptional
boolean

Set to true to delete the company tax number field, false to keep it

Example: false
Possible values:truefalse
company_vat_nroptional
boolean

Set to true to delete the company VAT number field, false to keep it

Example: false
Possible values:truefalse
company_register_nroptional
boolean

Set to true to delete the company registration number field, false to keep it

Example: false
Possible values:truefalse
employees_researchoptional
boolean

Set to true to delete the employees research field, false to keep it

Example: false
Possible values:truefalse
employees_linkedinoptional
boolean

Set to true to delete the employees LinkedIn field, false to keep it

Example: false
Possible values:truefalse
company_founded_yearoptional
boolean

Set to true to delete the company founded year field, false to keep it

Example: false
Possible values:truefalse
descriptionoptional
boolean

Set to true to delete the company description field, false to keep it

Example: false
Possible values:truefalse
company_logo_urloptional
boolean

Set to true to delete the company logo URL field, false to keep it

Example: false
Possible values:truefalse
company_size_linkedinoptional
boolean

Set to true to delete the company size LinkedIn field, false to keep it

Example: false
Possible values:truefalse
company_linkedin_followersoptional
boolean

Set to true to delete the LinkedIn followers field, false to keep it

Example: false
Possible values:truefalse
company_tagsoptional
enum: textarray

Array of company tag values to delete from the database array. All inputted values in the array will be removed from the table array field

Example: ["technology","saas"]
company_sourcesoptional
enum: data_sourcesarray

Array of data source values to delete from the database array. All inputted values in the array will be removed from the table array field

Example: ["linkedin","apollo"]
Allowed values:
lushaclayapollonorth_datad7_lead_finderstoreleadsbuild_withsales_navigator
company_qualifiedoptional
boolean

Set to true to delete the qualification status field, false to keep it

Example: false
Possible values:truefalse
company_custom_tags_wsoptional
enum: textarray

Array of custom workspace tag values to delete from the database array. All inputted values in the array will be removed from the table array field

Example: ["vip","partner"]
POST
curl -X POST "https://web-production-603a8.up.railway.app/company_delete_fields"
  -H "api_key: {{your_api_key}}"
  -H "Content-Type: application/json"
  -d '{
  "company_id": "00000000-0000-0000-0000-000000000000",
  "company_workspace_id": "00000000-0000-0000-0000-000000000000",
  "company_name": "https://linkedin.com/company/acme",
  "company_domain": "acme.com",
  "company_linkedin": "https://linkedin.com/company/acme",
  "company_email": "info@acme.com",
  "company_phone": "+491234567890",
  "company_instagram": "https://instagram.com/acme",
  "company_facebook": "https://facebook.com/acme",
  "company_xing": "https://xing.com/companies/acme",
  "company_pinterest": "https://pinterest.com/acme",
  "company_tiktok": "https://tiktok.com/@acme",
  "company_youtube": "https://youtube.com/@acme",
  "company_twitter": "https://twitter.com/acme",
  "company_name_cleaned": false,
  "company_legal_form": false,
  "b2b_b2c": false,
  "company_imprint_name": false,
  "company_street": false,
  "company_street_nr": false,
  "company_city": false,
  "company_zip": false,
  "company_region": false,
  "company_country": false,
  "company_steuer_nr": false,
  "company_vat_nr": false,
  "company_register_nr": false,
  "employees_research": false,
  "employees_linkedin": false,
  "company_founded_year": false,
  "description": false,
  "company_logo_url": false,
  "company_size_linkedin": false,
  "company_linkedin_followers": false,
  "company_tags": [
    "technology",
    "saas"
  ],
  "company_sources": [
    "linkedin",
    "apollo"
  ],
  "company_qualified": false,
  "company_custom_tags_ws": [
    "vip",
    "partner"
  ]
}'

Response Parameters

successboolean

Indicates whether the operation completed successfully. true = operation completed (even if no fields were deleted), false = operation failed due to error.

Example: true
messagestring

Detailed message describing what operations were performed. Success format: "Successfully completed: <list of operations>". Message may contain: "Set N fields to NULL" (boolean fields set to NULL in db_companies_main), "Removed N items from company_tags" (tags removed from company_tags array), "Removed N items from company_sources" (sources removed from company_sources array), "Deleted N identifier records" (identifier records deleted from db_companies_dt_identifiers), "Workspace: Set company_qualified to NULL" (workspace field set to NULL), "Workspace: Removed N custom tags" (custom tags removed from workspace).

Example: Successfully completed: Set 3 fields to NULL; Removed 2 items from company_tags; Deleted 1 identifier records; Workspace: Set company_qualified to NULL, Removed 2 custom tags
Response
{
  "success": true,
  "message": "Successfully completed: Set 3 fields to NULL; Removed 2 items from company_tags; Deleted 1 identifier records; Workspace: Set company_qualified to NULL, Removed 2 custom tags"
}

People

Manage people and contact records. Create, read, update, and delete person information including names, email addresses, phone numbers, and associated company relationships.

GET/contact_get

Get all peopl records.

Required: The following fields are mandatory

lead_id

Query Parameters

lead_idrequired
uuid

The unique identifier of the lead/contact

Example: 00000000-0000-0000-0000-000000000000
workspace_idclient_data
uuid

The workspace identifier

Example: 00000000-0000-0000-0000-000000000000
company_id
uuid

Company identifier

Example: 00000000-0000-0000-0000-000000000000
GET
curl -X GET "https://web-production-603a8.up.railway.app/contact_get?lead_id=00000000-0000-0000-0000-000000000000&workspace_id=00000000-0000-0000-0000-000000000000&company_id=00000000-0000-0000-0000-000000000000"
  -H "api_key: {{your_api_key}}"
  -H "Content-Type: application/json"

Response Parameters

people_idstring

Returns person-uuid-456

companies_main_idstring

Returns company-uuid-789

lead_positionstring

Returns Chief Executive Officer

lead_position_cleanedstring

Returns CEO

lead_senioritystring

Returns C-Level

lead_departementstring

Returns Management

still_at_companystring

Returns yes

lead_start_datestring

Returns 2020-01-15

lead_end_dateobject

Object containing nested data

lead_seniority_enumstring

Returns c_level

lead_departement_enumstring

Returns management

lead_position_clean_plural_dativstring

Returns CEOs

lead_position_clean_plural_nominativstring

Returns CEOs

lead_summarystring

Returns Experienced executive with 15 years in tech

lead_sourcesarray

Array of values

db_leads_created_atstring

Returns 2024-02-10T09:15:00Z

db_leads_updated_atstring

Returns 2024-03-25T11:20:00Z

person_first_namestring

Returns John

contact_first_name_cleanedstring

Returns John

person_last_namestring

Returns Doe

contact_last_name_cleanedstring

Returns Doe

person_genderstring

Returns male

person_languagestring

Returns de

contact_estimated_birth_yearstring

Returns 1980

contact_birth_yearstring

Returns 1982

contact_birth_datestring

Returns 1982-05-15

person_countrystring

Returns Germany

person_citystring

Returns Berlin

linkedin_cvstring

Returns Extensive experience in software development and leadership

linkedin_volunteeringsstring

Returns Board member at Tech for Good

started_education_linkedinstring

Returns 2000

first_job_start_linkedinstring

Returns 2004

contact_locationstring

Returns Berlin, Germany

contact_academic_titlestring

Returns Dr.

person_statestring

Returns Berlin

person_native_germanstring

Returns yes

person_scooling_countrystring

Returns Germany

contact_linkedin_image_urlstring

Returns https://media.linkedin.com/profile.jpg

person_linkedin_followersstring

Returns 2500

person_linkedin_connectionsstring

Returns 500+

db_people_created_atstring

Returns 2024-01-05T08:30:00Z

db_people_updated_atstring

Returns 2024-03-18T10:45:00Z

contact_linkedinsarray

Array of values

contact_xingsarray

Array of values

contact_emails_validarray

Array of values

contact_emails_invalidarray

Array of values

contact_emails_catch_allarray

Array of values

contact_emails_wrongarray

Array of values

contact_emails_unsurearray

Array of values

lead_workspace_idstring

Returns lead-ws-uuid-345

lead_qualified_wsstring

Returns yes

db_leads_workspace_created_atstring

Returns 2024-02-15T13:00:00Z

db_leads_workspace_updated_atstring

Returns 2024-03-26T15:30:00Z

Response
{
  "people_id": "person-uuid-456",
  "companies_main_id": "company-uuid-789",
  "lead_position": "Chief Executive Officer",
  "lead_position_cleaned": "CEO",
  "lead_seniority": "C-Level",
  "lead_departement": "Management",
  "still_at_company": "yes",
  "lead_start_date": "2020-01-15",
  "lead_end_date": null,
  "lead_seniority_enum": "c_level",
  "lead_departement_enum": "management",
  "lead_position_clean_plural_dativ": "CEOs",
  "lead_position_clean_plural_nominativ": "CEOs",
  "lead_summary": "Experienced executive with 15 years in tech",
  "lead_sources": [
    "linkedin",
    "company_website"
  ],
  "db_leads_created_at": "2024-02-10T09:15:00Z",
  "db_leads_updated_at": "2024-03-25T11:20:00Z",
  "person_first_name": "John",
  "contact_first_name_cleaned": "John",
  "person_last_name": "Doe",
  "contact_last_name_cleaned": "Doe",
  "person_gender": "male",
  "person_language": "de",
  "contact_estimated_birth_year": "1980",
  "contact_birth_year": "1982",
  "contact_birth_date": "1982-05-15",
  "person_country": "Germany",
  "person_city": "Berlin",
  "linkedin_cv": "Extensive experience in software development and leadership",
  "linkedin_volunteerings": "Board member at Tech for Good",
  "started_education_linkedin": "2000",
  "first_job_start_linkedin": "2004",
  "contact_location": "Berlin, Germany",
  "contact_academic_title": "Dr.",
  "person_state": "Berlin",
  "person_native_german": "yes",
  "person_scooling_country": "Germany",
  "contact_linkedin_image_url": "https://media.linkedin.com/profile.jpg",
  "person_linkedin_followers": "2500",
  "person_linkedin_connections": "500+",
  "db_people_created_at": "2024-01-05T08:30:00Z",
  "db_people_updated_at": "2024-03-18T10:45:00Z",
  "contact_linkedins": [
    "https://linkedin.com/in/johndoe",
    "https://linkedin.com/in/john-doe"
  ],
  "contact_xings": [
    "https://xing.com/profile/johndoe"
  ],
  "contact_emails_valid": [
    "john@example.com",
    "j.doe@example.com"
  ],
  "contact_emails_invalid": [
    "oldaddress@defunct.com"
  ],
  "contact_emails_catch_all": [
    "info@example.com"
  ],
  "contact_emails_wrong": [],
  "contact_emails_unsure": [
    "john.doe@maybe.com"
  ],
  "lead_workspace_id": "lead-ws-uuid-345",
  "lead_qualified_ws": "yes",
  "db_leads_workspace_created_at": "2024-02-15T13:00:00Z",
  "db_leads_workspace_updated_at": "2024-03-26T15:30:00Z"
}

POST/contact_lookup

Lookup if Lead and Person exists, and push if not exists.

Required: At least one of the following conditions must be met

contact_linkedinORcontact_xingORcontact_email_validORcontact_email_catch_allORcontact_email_invalidORcontact_email_unsureOR(company_idAND(contact_first_nameORcontact_first_name_cleaned)AND(contact_last_nameORcontact_last_name_cleaned))

Request Body

workspace_idclient_data
uuid

The workspace identifier

Example: 00000000-0000-0000-0000-000000000000
company_idat least 1 required
uuid

Company identifier

Example: 00000000-0000-0000-0000-000000000000
contact_linkedinat least 1 required
string

Contact LinkedIn URL

Example: https://linkedin.com/in/max-mueller
contact_xingat least 1 required
string

Contact XING URL

Example: https://xing.com/profile/max-mueller
contact_email_validat least 1 required
string

Valid email address

Example: max.mueller@example.com
contact_email_catch_allat least 1 required
string

Catch-all email address

Example: info@example.com
contact_email_invalidat least 1 required
string

Invalid email address

Example: invalid@example.com
contact_email_unsureat least 1 required
string

Unsure email address

Example: unsure@example.com
contact_first_nameat least 1 required
string

Contact first name

Example: Max
contact_first_name_cleanedat least 1 required
string

Cleaned contact first name

Example: Max
contact_last_nameat least 1 required
string

Contact last name

Example: Müller
contact_last_name_cleanedat least 1 required
string

Cleaned contact last name

Example: Mueller
POST
curl -X POST "https://web-production-603a8.up.railway.app/contact_lookup"
  -H "api_key: {{your_api_key}}"
  -H "Content-Type: application/json"
  -d '{
  "workspace_id": "00000000-0000-0000-0000-000000000000",
  "company_id": "00000000-0000-0000-0000-000000000000",
  "contact_linkedin": "https://linkedin.com/in/max-mueller",
  "contact_xing": "https://xing.com/profile/max-mueller",
  "contact_email_valid": "max.mueller@example.com",
  "contact_email_catch_all": "info@example.com",
  "contact_email_invalid": "invalid@example.com",
  "contact_email_unsure": "unsure@example.com",
  "contact_first_name": "Max",
  "contact_first_name_cleaned": "Max",
  "contact_last_name": "Müller",
  "contact_last_name_cleaned": "Mueller"
}'

Response Parameters

lead_iduuid

UUID of the lead in db_leads table. Represents the connection between a person and a company. Returns UUID if lead found, null if not found.

Example: 00000000-0000-0000-0000-000000000000
person_iduuid

UUID of the person in db_people table. Returns UUID if person found, null if not found.

Example: 00000000-0000-0000-0000-000000000000
lead_workspace_iduuid

UUID of the lead-workspace connection in db_leads_workspace table. Only populated if workspace_id was provided in request. Returns UUID if workspace connection exists, null if not found.

Example: 00000000-0000-0000-0000-000000000000
people_workspace_iduuid

UUID of the people-workspace connection in db_people_workspace table. Only populated if workspace_id was provided in request. Returns UUID if workspace connection exists, null if not found.

Example: 00000000-0000-0000-0000-000000000000
errorstring

Error message if lookup failed. null if no error occurred.

Response
{
  "lead_id": "lead-uuid-123",
  "person_id": "person-uuid-456",
  "lead_workspace_id": "lead-ws-uuid-789",
  "people_workspace_id": "people-ws-uuid-012",
  "error": null
}

POST/contact_push

Push new contact data to database.

Required: At least one of the following conditions must be met

contact_linkedinORcontact_xingORcontact_email_validORcontact_email_catch_allORcontact_email_invalidORcontact_email_unsureOR(company_idAND(contact_first_nameORcontact_first_name_cleaned)AND(contact_last_nameORcontact_last_name_cleaned))

Request Body

workspace_idclient_data
uuid

The workspace identifier

Example: 00000000-0000-0000-0000-000000000000
company_idat least 1 required
uuid

Company identifier

Example: 00000000-0000-0000-0000-000000000000
contact_email_validat least 1 required
string

Valid email address

Example: max.mueller@example.com
contact_email_catch_allat least 1 required
string

Catch-all email address

Example: info@example.com
contact_email_invalidat least 1 required
string

Invalid email address

Example: invalid@example.com
contact_email_unsureat least 1 required
string

Unsure email address

Example: unsure@example.com
contact_linkedinat least 1 required
string

Contact LinkedIn URL

Example: https://linkedin.com/in/max-mueller
contact_xingat least 1 required
string

Contact XING URL

Example: https://xing.com/profile/max-mueller
contact_first_nameat least 1 required
string

Contact first name

Example: Max
contact_first_name_cleanedat least 1 required
string

Cleaned contact first name

Example: Max
contact_last_nameat least 1 required
string

Contact last name

Example: Müller
contact_last_name_cleanedat least 1 required
string

Cleaned contact last name

Example: Mueller
contact_genderoptional
string

Contact gender

Example: Male
contact_languageoptional
string

Contact language

Example: German
contact_estimated_birth_yearoptional
int2

Estimated birth year

Example: 1990
contact_birth_yearoptional
int2

Birth year

Example: 1990
contact_birth_dateoptional
string

Birth date

Example: 1990-05-15
person_countryoptional
string

Person country

Example: Germany
person_cityoptional
string

Person city

Example: München
linkedin_cvoptional
jsonb

LinkedIn CV data

Example: {"experience":[{"company":"Example GmbH","position":"Sales Manager","duration":"2020-2023"}]}
linkedin_volunteeringsoptional
jsonb

LinkedIn volunteering activities

Example: {"organizations":["Non-Profit Organization"]}
started_education_linkedinoptional
int8

Education start date from LinkedIn

Example: 2010
first_job_start_linkedinoptional
int8

First job start date from LinkedIn

Example: 2015
contact_locationoptional
string

Contact location

Example: München, Bayern, Germany
contact_academic_titleoptional
string

Academic title

Example: Dr.
person_stateoptional
string

Person state/region

Example: Bayern
person_native_germanoptional
string

Native German speaker indicator

Example: true
person_scooling_countryoptional
string

Schooling country

Example: Germany
contact_linkedin_image_urloptional
string

LinkedIn profile image URL

Example: https://media.licdn.com/dms/image/example/profile.jpg
person_linkedin_followersoptional
int8

Number of LinkedIn followers

Example: 500
person_linkedin_connectionsoptional
int8

Number of LinkedIn connections

Example: 300
lead_positionoptional
string

Job position

Example: Sales Manager
lead_position_cleanedoptional
string

Cleaned job position

Example: Sales Manager
lead_seniorityoptional
string

Seniority level

Example: Manager
lead_departementoptional
string

Department

Example: Sales
still_at_companyoptional
bool

Still employed at company

Example: true
Possible values:truefalse
lead_start_dateoptional
date

Position start date

Example: 2020-01-01
lead_end_dateoptional
date

Position end date

Example: 2023-12-31
lead_seniority_enumoptional
enum: contact_seniority

Seniority level enum

Example: manager
Allowed values:
c-levelgeschäftsführungheadmanagerentrydirectorpartnerpresidentintern
lead_departement_enumoptional
enum: contact_departement

Department enum

Example: sales
Allowed values:
marketingsalesgeschäftsführungprocurementlegalaccountingfinance
lead_position_clean_plural_dativoptional
string

Position in plural dativ form

Example: Vertriebsleitern
lead_position_clean_plural_nominativoptional
string

Position in plural nominativ form

Example: Vertriebsleiter
lead_summaryoptional
string

Lead summary

Example: Experienced sales professional with 10+ years in B2B software sales
lead_sourcesoptional
enum: data_sourcesarray

Lead sources

Example: ["apollo","sales_navigator"]
Allowed values:
lushaclayapollonorth_datad7_lead_finderstoreleadsbuild_withsales_navigator
lead_qualified_wsoptional
enum: pending_boolean

Workspace qualification status

Example: qualified
Allowed values:
qualifiedpendingnot_qualified
POST
curl -X POST "https://web-production-603a8.up.railway.app/contact_push"
  -H "api_key: {{your_api_key}}"
  -H "Content-Type: application/json"
  -d '{
  "workspace_id": "00000000-0000-0000-0000-000000000000",
  "company_id": "00000000-0000-0000-0000-000000000000",
  "contact_email_valid": "max.mueller@example.com",
  "contact_email_catch_all": "info@example.com",
  "contact_email_invalid": "invalid@example.com",
  "contact_email_unsure": "unsure@example.com",
  "contact_linkedin": "https://linkedin.com/in/max-mueller",
  "contact_xing": "https://xing.com/profile/max-mueller",
  "contact_first_name": "Max",
  "contact_first_name_cleaned": "Max",
  "contact_last_name": "Müller",
  "contact_last_name_cleaned": "Mueller",
  "contact_gender": "Male",
  "contact_language": "German",
  "contact_estimated_birth_year": 1990,
  "contact_birth_year": 1990,
  "contact_birth_date": "1990-05-15",
  "person_country": "Germany",
  "person_city": "München",
  "linkedin_cv": {
    "experience": [
      {
        "company": "Example GmbH",
        "position": "Sales Manager",
        "duration": "2020-2023"
      }
    ]
  },
  "linkedin_volunteerings": {
    "organizations": [
      "Non-Profit Organization"
    ]
  },
  "started_education_linkedin": 2010,
  "first_job_start_linkedin": 2015,
  "contact_location": "München, Bayern, Germany",
  "contact_academic_title": "Dr.",
  "person_state": "Bayern",
  "person_native_german": "true",
  "person_scooling_country": "Germany",
  "contact_linkedin_image_url": "https://media.licdn.com/dms/image/example/profile.jpg",
  "person_linkedin_followers": 500,
  "person_linkedin_connections": 300,
  "lead_position": "Sales Manager",
  "lead_position_cleaned": "Sales Manager",
  "lead_seniority": "Manager",
  "lead_departement": "Sales",
  "still_at_company": true,
  "lead_start_date": "2020-01-01",
  "lead_end_date": "2023-12-31",
  "lead_seniority_enum": "manager",
  "lead_departement_enum": "sales",
  "lead_position_clean_plural_dativ": "Vertriebsleitern",
  "lead_position_clean_plural_nominativ": "Vertriebsleiter",
  "lead_summary": "Experienced sales professional with 10+ years in B2B software sales",
  "lead_sources": [
    "apollo",
    "sales_navigator"
  ],
  "lead_qualified_ws": "qualified"
}'

Response Parameters

person_iduuid

UUID of the person record (either found or newly created). null only if person operation failed.

Example: 00000000-0000-0000-0000-000000000000
lead_iduuid

UUID of the lead record (either found or newly created). Lead connects a person to a company with position/role information. null if lead operation not performed or failed. Requires both person_id and company_id to be present.

Example: 00000000-0000-0000-0000-000000000000
people_workspace_iduuid

UUID of the people-workspace connection (either found or newly created). Only populated if workspace_id was provided in request. null if workspace operation not performed or failed.

Example: 00000000-0000-0000-0000-000000000000
lead_workspace_iduuid

UUID of the lead-workspace connection (either found or newly created). Only populated if workspace_id was provided in request. null if workspace operation not performed or failed.

Example: 00000000-0000-0000-0000-000000000000
status_personenum

Status of the person record operation. Can be "found" (person already existed), "created" (new person created), or null (operation failed).

Example: created
Possible values:
foundcreated
status_leadenum

Status of the lead record operation. Can be "found" (lead already existed), "created" (new lead created), or null (not performed or failed). Only created if both person_id and company_id exist.

Example: created
Possible values:
foundcreated
status_people_workspaceenum

Status of the people-workspace connection operation. Can be "found" (workspace connection already existed), "created" (new workspace connection created), or null (not performed or failed). Only relevant when workspace_id provided in request.

Example: created
Possible values:
foundcreated
status_lead_workspaceenum

Status of the lead-workspace connection operation. Can be "found" (workspace connection already existed), "created" (new workspace connection created), or null (not performed or failed). Only relevant when workspace_id provided in request.

Example: created
Possible values:
foundcreated
errorstring

Error message if any operation failed. null if all operations succeeded.

Response
{
  "person_id": "person-uuid-456",
  "lead_id": "lead-uuid-123",
  "people_workspace_id": "people-ws-uuid-012",
  "lead_workspace_id": "lead-ws-uuid-789",
  "status_person": "created",
  "status_lead": "created",
  "status_people_workspace": "created",
  "status_lead_workspace": "created",
  "error": null
}

POST/contact_push_patch

Update existing contact data in database.

Required: At least one of the following conditions must be met

contact_linkedinORcontact_xingORcontact_email_validORcontact_email_catch_allORcontact_email_invalidORcontact_email_unsureOR(company_idAND(contact_first_nameORcontact_first_name_cleaned)AND(contact_last_nameORcontact_last_name_cleaned))

Request Body

workspace_idclient_data
uuid

The workspace identifier

Example: 00000000-0000-0000-0000-000000000000
company_idat least 1 required
uuid

Company identifier

Example: 00000000-0000-0000-0000-000000000000
contact_email_validat least 1 required
string

Valid email address

Example: max.mueller@example.com
contact_email_catch_allat least 1 required
string

Catch-all email address

Example: info@example.com
contact_email_invalidat least 1 required
string

Invalid email address

Example: invalid@example.com
contact_email_unsureat least 1 required
string

Unsure email address

Example: unsure@example.com
contact_linkedinat least 1 required
string

Contact LinkedIn URL

Example: https://linkedin.com/in/max-mueller
contact_xingat least 1 required
string

Contact XING URL

Example: https://xing.com/profile/max-mueller
contact_first_nameat least 1 required
string

Contact first name

Example: Max
contact_first_name_cleanedat least 1 required
string

Cleaned contact first name

Example: Max
contact_last_nameat least 1 required
string

Contact last name

Example: Müller
contact_last_name_cleanedat least 1 required
string

Cleaned contact last name

Example: Mueller
contact_genderoptional
string

Contact gender

Example: Male
contact_languageoptional
string

Contact language

Example: German
contact_estimated_birth_yearoptional
int2

Estimated birth year

Example: 1990
contact_birth_yearoptional
int2

Birth year

Example: 1990
contact_birth_dateoptional
string

Birth date

Example: 1990-05-15
person_countryoptional
string

Person country

Example: Germany
person_cityoptional
string

Person city

Example: München
linkedin_cvoptional
jsonb

LinkedIn CV data

Example: {"experience":[{"company":"Example GmbH","position":"Sales Manager","duration":"2020-2023"}]}
linkedin_volunteeringsoptional
jsonb

LinkedIn volunteering activities

Example: {"organizations":["Non-Profit Organization"]}
started_education_linkedinoptional
int8

Education start date from LinkedIn

Example: 2010
first_job_start_linkedinoptional
int8

First job start date from LinkedIn

Example: 2015
contact_locationoptional
string

Contact location

Example: München, Bayern, Germany
contact_academic_titleoptional
string

Academic title

Example: Dr.
person_stateoptional
string

Person state/region

Example: Bayern
person_native_germanoptional
string

Native German speaker indicator

Example: true
person_scooling_countryoptional
string

Schooling country

Example: Germany
contact_linkedin_image_urloptional
string

LinkedIn profile image URL

Example: https://media.licdn.com/dms/image/example/profile.jpg
person_linkedin_followersoptional
int8

Number of LinkedIn followers

Example: 500
person_linkedin_connectionsoptional
int8

Number of LinkedIn connections

Example: 300
lead_positionoptional
string

Job position

Example: Sales Manager
lead_position_cleanedoptional
string

Cleaned job position

Example: Sales Manager
lead_seniorityoptional
string

Seniority level

Example: Manager
lead_departementoptional
string

Department

Example: Sales
still_at_companyoptional
bool

Still employed at company

Example: true
Possible values:truefalse
lead_start_dateoptional
date

Position start date

Example: 2020-01-01
lead_end_dateoptional
date

Position end date

Example: 2023-12-31
lead_position_clean_plural_dativoptional
string

Position in plural dativ form

Example: Vertriebsleitern
lead_position_clean_plural_nominativoptional
string

Position in plural nominativ form

Example: Vertriebsleiter
lead_summaryoptional
string

Lead summary

Example: Experienced sales professional with 10+ years in B2B software sales
lead_sourcesoptional
enum: data_sourcesarray

Lead sources

Example: ["apollo","sales_navigator"]
Allowed values:
lushaclayapollonorth_datad7_lead_finderstoreleadsbuild_withsales_navigator
lead_qualified_wsoptional
enum: pending_boolean

Workspace qualification status

Example: qualified
Allowed values:
qualifiedpendingnot_qualified
POST
curl -X POST "https://web-production-603a8.up.railway.app/contact_push_patch"
  -H "api_key: {{your_api_key}}"
  -H "Content-Type: application/json"
  -d '{
  "workspace_id": "00000000-0000-0000-0000-000000000000",
  "company_id": "00000000-0000-0000-0000-000000000000",
  "contact_email_valid": "max.mueller@example.com",
  "contact_email_catch_all": "info@example.com",
  "contact_email_invalid": "invalid@example.com",
  "contact_email_unsure": "unsure@example.com",
  "contact_linkedin": "https://linkedin.com/in/max-mueller",
  "contact_xing": "https://xing.com/profile/max-mueller",
  "contact_first_name": "Max",
  "contact_first_name_cleaned": "Max",
  "contact_last_name": "Müller",
  "contact_last_name_cleaned": "Mueller",
  "contact_gender": "Male",
  "contact_language": "German",
  "contact_estimated_birth_year": 1990,
  "contact_birth_year": 1990,
  "contact_birth_date": "1990-05-15",
  "person_country": "Germany",
  "person_city": "München",
  "linkedin_cv": {
    "experience": [
      {
        "company": "Example GmbH",
        "position": "Sales Manager",
        "duration": "2020-2023"
      }
    ]
  },
  "linkedin_volunteerings": {
    "organizations": [
      "Non-Profit Organization"
    ]
  },
  "started_education_linkedin": 2010,
  "first_job_start_linkedin": 2015,
  "contact_location": "München, Bayern, Germany",
  "contact_academic_title": "Dr.",
  "person_state": "Bayern",
  "person_native_german": "true",
  "person_scooling_country": "Germany",
  "contact_linkedin_image_url": "https://media.licdn.com/dms/image/example/profile.jpg",
  "person_linkedin_followers": 500,
  "person_linkedin_connections": 300,
  "lead_position": "Sales Manager",
  "lead_position_cleaned": "Sales Manager",
  "lead_seniority": "Manager",
  "lead_departement": "Sales",
  "still_at_company": true,
  "lead_start_date": "2020-01-01",
  "lead_end_date": "2023-12-31",
  "lead_position_clean_plural_dativ": "Vertriebsleitern",
  "lead_position_clean_plural_nominativ": "Vertriebsleiter",
  "lead_summary": "Experienced sales professional with 10+ years in B2B software sales",
  "lead_sources": [
    "apollo",
    "sales_navigator"
  ],
  "lead_qualified_ws": "qualified"
}'

Response Parameters

person_iduuid

UUID of the person record (found, created, or updated). null only if person operation failed.

Example: 00000000-0000-0000-0000-000000000000
lead_iduuid

UUID of the lead record (found, created, or updated). Lead connects a person to a company with position/role information. null if lead operation not performed or failed. Requires both person_id and company_id to be present.

Example: 00000000-0000-0000-0000-000000000000
people_workspace_iduuid

UUID of the people-workspace connection (found, created, or updated). Only populated if workspace_id was provided in request. null if workspace operation not performed or failed.

Example: 00000000-0000-0000-0000-000000000000
lead_workspace_iduuid

UUID of the lead-workspace connection (found, created, or updated). Only populated if workspace_id was provided in request. null if workspace operation not performed or failed.

Example: 00000000-0000-0000-0000-000000000000
status_personenum

Status of the person record operation. Can be "created" (new person created), "updated" (existing person updated), or null (operation failed). Note: Unlike /contact_push, this endpoint never returns "found" - it always updates if found.

Example: updated
Possible values:
createdupdated
status_leadenum

Status of the lead record operation. Can be "created" (new lead created), "updated" (existing lead updated), or null (not performed or failed). Note: Always updates if lead exists.

Example: updated
Possible values:
createdupdated
status_people_workspaceenum

Status of the people-workspace connection operation. Can be "created" (new workspace connection created), "updated" (existing workspace connection updated), or null (not performed or failed). Only relevant when workspace_id provided in request. Note: Always updates if workspace connection exists.

Example: updated
Possible values:
createdupdated
status_lead_workspaceenum

Status of the lead-workspace connection operation. Can be "created" (new workspace connection created), "updated" (existing workspace connection updated), or null (not performed or failed). Only relevant when workspace_id provided in request. Note: Always updates if workspace connection exists.

Example: updated
Possible values:
createdupdated
errorstring

Error message if any operation failed. null if all operations succeeded.

Response
{
  "person_id": "person-uuid-456",
  "lead_id": "lead-uuid-123",
  "people_workspace_id": "people-ws-uuid-012",
  "lead_workspace_id": "lead-ws-uuid-789",
  "status_person": "updated",
  "status_lead": "updated",
  "status_people_workspace": "updated",
  "status_lead_workspace": "updated",
  "error": null
}

POST/contact_delete_fields

Create a new peopl record.

Required: The following fields are mandatory

people_idANDlead_id

Request Body

people_idrequired
uuid

The unique identifier of the person

Example: 00000000-0000-0000-0000-000000000000
lead_idrequired
uuid

The unique identifier of the lead

Example: 00000000-0000-0000-0000-000000000000
people_workspace_idclient_data
uuid

The workspace identifier for people

Example: 00000000-0000-0000-0000-000000000000
leads_workspace_idclient_data
uuid

The workspace identifier for leads

Example: 00000000-0000-0000-0000-000000000000
contact_linkedinoptional
string

The exact LinkedIn URL value to delete from the database

Example: https://linkedin.com/in/john-doe
contact_xingoptional
string

The exact XING URL value to delete from the database

Example: https://xing.com/profile/john-doe
contact_email_validoptional
string

The exact valid email address to delete from the database

Example: john.doe@example.com
contact_email_catch_alloptional
string

The exact catch-all email address to delete from the database

Example: contact@example.com
contact_email_invalidoptional
string

The exact invalid email address to delete from the database

Example: invalid@example.com
contact_email_unsureoptional
string

The exact unsure email address to delete from the database

Example: unsure@example.com
contact_first_name_cleanedoptional
boolean

Set to true to delete the cleaned first name field, false to keep it

Example: false
Possible values:truefalse
contact_last_name_cleanedoptional
boolean

Set to true to delete the cleaned last name field, false to keep it

Example: false
Possible values:truefalse
person_genderoptional
boolean

Set to true to delete the gender field, false to keep it

Example: false
Possible values:truefalse
person_languageoptional
boolean

Set to true to delete the language field, false to keep it

Example: false
Possible values:truefalse
contact_estimated_birth_yearoptional
boolean

Set to true to delete the estimated birth year field, false to keep it

Example: false
Possible values:truefalse
contact_birth_yearoptional
boolean

Set to true to delete the birth year field, false to keep it

Example: false
Possible values:truefalse
contact_birth_dateoptional
boolean

Set to true to delete the birth date field, false to keep it

Example: false
Possible values:truefalse
person_countryoptional
boolean

Set to true to delete the country field, false to keep it

Example: false
Possible values:truefalse
person_cityoptional
boolean

Set to true to delete the city field, false to keep it

Example: false
Possible values:truefalse
linkedin_cvoptional
boolean

Set to true to delete the LinkedIn CV field, false to keep it

Example: false
Possible values:truefalse
linkedin_volunteeringsoptional
boolean

Set to true to delete the LinkedIn volunteerings field, false to keep it

Example: false
Possible values:truefalse
started_education_linkedinoptional
boolean

Set to true to delete the education start date field, false to keep it

Example: false
Possible values:truefalse
first_job_start_linkedinoptional
boolean

Set to true to delete the first job start date field, false to keep it

Example: false
Possible values:truefalse
contact_locationoptional
boolean

Set to true to delete the location field, false to keep it

Example: false
Possible values:truefalse
contact_academic_titleoptional
boolean

Set to true to delete the academic title field, false to keep it

Example: false
Possible values:truefalse
person_stateoptional
boolean

Set to true to delete the state field, false to keep it

Example: false
Possible values:truefalse
person_native_germanoptional
boolean

Set to true to delete the native German field, false to keep it

Example: false
Possible values:truefalse
person_scooling_countryoptional
boolean

Set to true to delete the schooling country field, false to keep it

Example: false
Possible values:truefalse
contact_linkedin_image_urloptional
boolean

Set to true to delete the LinkedIn image URL field, false to keep it

Example: false
Possible values:truefalse
person_linkedin_followersoptional
boolean

Set to true to delete the LinkedIn followers field, false to keep it

Example: false
Possible values:truefalse
person_linkedin_connectionsoptional
boolean

Set to true to delete the LinkedIn connections field, false to keep it

Example: false
Possible values:truefalse
lead_positionoptional
boolean

Set to true to delete the position field, false to keep it

Example: false
Possible values:truefalse
lead_position_cleanedoptional
boolean

Set to true to delete the cleaned position field, false to keep it

Example: false
Possible values:truefalse
lead_seniorityoptional
boolean

Set to true to delete the seniority field, false to keep it

Example: false
Possible values:truefalse
lead_departementoptional
boolean

Set to true to delete the department field, false to keep it

Example: false
Possible values:truefalse
still_at_companyoptional
boolean

Set to true to delete the still at company field, false to keep it

Example: false
Possible values:truefalse
lead_start_dateoptional
boolean

Set to true to delete the start date field, false to keep it

Example: false
Possible values:truefalse
lead_end_dateoptional
boolean

Set to true to delete the end date field, false to keep it

Example: false
Possible values:truefalse
lead_seniority_enumoptional
boolean

Set to true to delete the seniority enum field, false to keep it

Example: false
Possible values:truefalse
lead_departement_enumoptional
boolean

Set to true to delete the department enum field, false to keep it

Example: false
Possible values:truefalse
lead_position_clean_plural_dativoptional
boolean

Set to true to delete the position plural dativ field, false to keep it

Example: false
Possible values:truefalse
lead_position_clean_plural_nominativoptional
boolean

Set to true to delete the position plural nominativ field, false to keep it

Example: false
Possible values:truefalse
lead_summaryoptional
boolean

Set to true to delete the lead summary field, false to keep it

Example: false
Possible values:truefalse
lead_sourcesoptional
enum: data_sourcesarray

Array of lead source values to delete from the database array. All inputted values in the array will be removed from the table array field

Example: ["apollo","sales_navigator"]
Allowed values:
lushaclayapollonorth_datad7_lead_finderstoreleadsbuild_withsales_navigator
lead_qualified_wsoptional
boolean

Set to true to delete the workspace qualification field, false to keep it

Example: false
Possible values:truefalse
POST
curl -X POST "https://web-production-603a8.up.railway.app/contact_delete_fields"
  -H "api_key: {{your_api_key}}"
  -H "Content-Type: application/json"
  -d '{
  "people_id": "00000000-0000-0000-0000-000000000000",
  "lead_id": "00000000-0000-0000-0000-000000000000",
  "people_workspace_id": "00000000-0000-0000-0000-000000000000",
  "leads_workspace_id": "00000000-0000-0000-0000-000000000000",
  "contact_linkedin": "https://linkedin.com/in/john-doe",
  "contact_xing": "https://xing.com/profile/john-doe",
  "contact_email_valid": "john.doe@example.com",
  "contact_email_catch_all": "contact@example.com",
  "contact_email_invalid": "invalid@example.com",
  "contact_email_unsure": "unsure@example.com",
  "contact_first_name_cleaned": false,
  "contact_last_name_cleaned": false,
  "person_gender": false,
  "person_language": false,
  "contact_estimated_birth_year": false,
  "contact_birth_year": false,
  "contact_birth_date": false,
  "person_country": false,
  "person_city": false,
  "linkedin_cv": false,
  "linkedin_volunteerings": false,
  "started_education_linkedin": false,
  "first_job_start_linkedin": false,
  "contact_location": false,
  "contact_academic_title": false,
  "person_state": false,
  "person_native_german": false,
  "person_scooling_country": false,
  "contact_linkedin_image_url": false,
  "person_linkedin_followers": false,
  "person_linkedin_connections": false,
  "lead_position": false,
  "lead_position_cleaned": false,
  "lead_seniority": false,
  "lead_departement": false,
  "still_at_company": false,
  "lead_start_date": false,
  "lead_end_date": false,
  "lead_seniority_enum": false,
  "lead_departement_enum": false,
  "lead_position_clean_plural_dativ": false,
  "lead_position_clean_plural_nominativ": false,
  "lead_summary": false,
  "lead_sources": [
    "apollo",
    "sales_navigator"
  ],
  "lead_qualified_ws": false
}'

Response Parameters

successboolean

Indicates whether the operation completed successfully. true = operation completed (even if no fields were deleted), false = operation failed due to error.

Example: true
messagestring

Detailed message describing what operations were performed. Success format: "Successfully completed: <list of operations>". Message may contain People Operations: "People: Set N fields to NULL" (boolean fields set to NULL in db_people), "People: Deleted N identifier records" (identifier records deleted from db_people_identifiers for LinkedIn/Xing). Leads Operations: "Leads: Set N fields to NULL" (boolean fields set to NULL in db_leads), "Leads: Removed N items from lead_sources" (sources removed from lead_sources array), "Leads: Deleted N email identifier records" (email identifier records deleted from db_leads_identifiers). Workspace Operations: "People Workspace: No fields to update (table only contains IDs)" (people workspace table has no deletable fields), "Leads Workspace: Set lead_qualified_ws to NULL" (workspace qualification field set to NULL). No operations format: "No operations performed (no fields specified for deletion)". Error format: "Database error: <error details>" or "Error: <error details>".

Example: Successfully completed: People: Set 2 fields to NULL; Leads: Set 3 fields to NULL; Leads: Removed 1 items from lead_sources; People: Deleted 1 identifier records; Leads: Deleted 2 email identifier records; Leads Workspace: Set lead_qualified_ws to NULL
Response
{
  "success": true,
  "message": "Successfully completed: People: Set 2 fields to NULL; Leads: Set 3 fields to NULL; Leads: Removed 1 items from lead_sources; People: Deleted 1 identifier records; Leads: Deleted 2 email identifier records; Leads Workspace: Set lead_qualified_ws to NULL"
}

Field Cleaning

Overview

Field cleaning is a critical data normalization process that ensures consistency, improves matching accuracy, and prevents duplicate records in the database. All incoming data is cleaned before database operations (lookup, push, push_patch).

When Field Cleaning is Applied

  • Before company lookup operations
  • Before contact/lead lookup operations
  • Before pushing new company data
  • Before pushing new contact/lead data
  • Before updating existing records (push_patch operations)

Purpose

  • Standardize data formats for accurate matching
  • Remove inconsistencies and variations
  • Enable reliable deduplication
  • Improve data quality

Pre-Processing Stage (Applied to ALL Fields)

Before any field-specific cleaning, ALL string fields undergo standardization:

1. Whitespace Stripping

Purpose: Remove leading/trailing spaces that cause matching failures

Input:
" example.com "
Output:
"example.com"

2. Quote Normalization (normalize_quotes)

Purpose: Convert all Unicode quote characters to standard ASCII quotes

Characters Replaced:

  • • ' (U+2019 - RIGHT SINGLE QUOTATION MARK) → '
  • • ' (U+2018 - LEFT SINGLE QUOTATION MARK) → '
  • • ` (U+0060 - GRAVE ACCENT/BACKTICK) → '
  • • ´ (U+00B4 - ACUTE ACCENT) → '
  • • " (U+201C - LEFT DOUBLE QUOTATION MARK) → "
  • • " (U+201D - RIGHT DOUBLE QUOTATION MARK) → "
Input (smart quotes):
"O'Brien's Company"
Output (regular apostrophes):
"O'Brien's Company"

Why This Matters:

  • • Smart quotes come from copy-paste from Word, PDFs, websites
  • • Database comparisons fail when quotes don't match
  • • Enables consistent matching across data sources

3. Empty Field Removal

Purpose: Remove fields that are null, empty string, or whitespace-only

Removal Criteria:

  • None → Removed
  • "" → Removed
  • " " → Removed (becomes "" after strip)

Impact:

  • • Reduces payload size
  • • Prevents NULL constraint violations
  • • Improves database performance
  • • Fields not removed: 0, False, [] (valid data)

Order of Operations

  1. 1Strip whitespace from all string fields
  2. 2Normalize quotes in all string fields
  3. 3Convert empty/whitespace fields to None
  4. 4Apply field-specific cleaning (domain, LinkedIn, etc.)
  5. 5Remove all None/empty fields from payload

Company Field Cleaning

The clean_company_fields(data: dict) → dict function processes company data through multiple stages.

1. Domain Cleaning (clean_domain)

Purpose: Normalize website URLs to consistent domain format for reliable matching

Detailed Algorithm:

  1. 1. Strip Whitespace: Remove leading/trailing spaces
  2. 2. Protocol Removal: Remove prefixes in order of priority:
    • - https://www.
    • - http://www.
    • - https://
    • - http://
    • - www.
  3. 3. Path Removal: Split by / and take only first part (domain)
  4. 4. Query Parameter Removal: Split by ? and take only first part
  5. 5. Null on Empty: If result is empty string → NULL

Examples:

✓ Valid Examples:

Input:
"https://www.example.com/about-us?ref=home"
Output:
"example.com"
Input:
"HTTP://WWW.COMPANY.DE/"
Output:
"company.de"
Input:
"subdomain.example.com/products"
Output:
"subdomain.example.com"
Input:
"www.test.org"
Output:
"test.org"
Input:
"https://api.service.com/v1/endpoint"
Output:
"api.service.com"

✗ Invalid (becomes NULL):

Input:
"" (empty string)
Output:
NULL
Input:
" " (whitespace only)
Output:
NULL
Input:
"https://" (no domain after protocol)
Output:
NULL

Edge Cases:

Handling:

  • Subdomains: Preserved intact (e.g., blog.company.com)
  • Fragments: Removed (e.g., example.com#sectionexample.com)
  • Multiple Slashes: Only first part kept
  • Port Numbers: Preserved (e.g., localhost:8080)
  • International Domains: Preserved as-is

Validation Logic:

  • • Does NOT validate actual domain format
  • • Does NOT check TLD validity (.com, .de)
  • • Does NOT perform DNS lookups
  • • Simply extracts and normalizes domain portion
  • • Allows localhost, 127.0.0.1

Error Handling:

  • • No try-except block needed (string operations only)
  • • Empty results after cleaning → NULL

Fields Cleaned:

  • company_domain
  • • All domain identifiers in company_domains array

Impact on Matching:

Without Cleaning:

"https://www.example.com" ≠ "example.com" → Creates duplicate

With Cleaning:

"https://www.example.com" = "example.com" → Prevents duplicate

2. LinkedIn URL Cleaning (clean_linkedin_url)

Purpose: Validate and standardize LinkedIn company page URLs

Detailed Algorithm:

  1. 1. Validation Check: URL must contain linkedin.com/company/
  2. 2. URL Splitting: Split URL by linkedin.com/company/
  3. 3. Slug Extraction:
    • - Take everything after linkedin.com/company/
    • - Remove trailing paths (split by /, take first part)
    • - Remove query parameters (split by ?, take first part)
  4. 4. Length Validation: Slug must be at least 2 characters long
  5. 5. Reconstruction: Build URL as https://www.linkedin.com/company/{slug}
  6. 6. Null on Failure: Set to NULL if any validation fails

Examples:

✓ Valid:

Input:
"https://www.linkedin.com/company/microsoft/"
Output:
"https://www.linkedin.com/company/microsoft"
Input:
"linkedin.com/company/google/about/"
Output:
"https://www.linkedin.com/company/google"
Input:
"https://de.linkedin.com/company/bmw-group?trk=public"
Output:
"https://www.linkedin.com/company/bmw-group"
Input:
"http://www.linkedin.com/company/apple"
Output:
"https://www.linkedin.com/company/apple"

✗ Invalid (becomes NULL):

Input:
"linkedin.com/school/stanford-university"
Reason: NOT /company/ URL
NULL
Input:
"https://www.linkedin.com/company/a"
Reason: Slug too short (1 char)
NULL
Input:
"linkedin.com/company/"
Reason: No slug
NULL
Input:
"https://www.linkedin.com/in/person-name"
Reason: Personal profile, not company
NULL
Input:
"https://facebook.com/company"
Reason: Not LinkedIn
NULL

IMPORTANT VALIDATION RULES:

  • ONLY accepts /company/ URLs
  • REJECTS /school/ URLsNULL (despite earlier documentation suggesting otherwise)
  • REJECTS personal profiles (/in/) → NULL
  • REJECTS showcase pages (/showcase/) → NULL

Edge Cases:

  • Locale Prefixes: Removed automatically (e.g., de.linkedin.comwww.linkedin.com)
  • Mobile URLs: Handled (e.g., m.linkedin.comwww.linkedin.com)
  • Query Parameters: All removed (e.g., ?trk=public, ?original_referer=)
  • Trailing Slashes: Removed from slug
  • Sub-pages: Removed (e.g., /about, /people, /jobs)

Slug Validation:

  • • Minimum length: 2 characters
  • • Can contain: letters, numbers, hyphens, underscores
  • • No validation of actual company existence on LinkedIn
  • • No case transformation (preserves original case)

Error Handling:

  • • Try-except block catches malformed URLs
  • • Any exception during processing → NULL
  • • Missing parts after split → NULL
  • • Empty slug after extraction → NULL

Common Rejection Scenarios:

Input TypeExampleResultReason
School pagelinkedin.com/school/stanfordNULLNot /company/
Personal profilelinkedin.com/in/john-doeNULLNot /company/
Showcase pagelinkedin.com/showcase/productNULLNot /company/
Short sluglinkedin.com/company/aNULLSlug < 2 chars
No sluglinkedin.com/company/NULLEmpty slug
Wrong platformxing.com/companies/testNULLNot LinkedIn

Fields Cleaned:

  • company_linkedin
  • • All LinkedIn identifiers in company_linkedins array

Impact on Matching:

Without Cleaning:

"https://de.linkedin.com/company/bmw?trk=public" ≠ "linkedin.com/company/bmw" → Creates duplicate

With Cleaning:

"https://de.linkedin.com/company/bmw?trk=public" = "https://www.linkedin.com/company/bmw" → Prevents duplicate

3. Email Cleaning

⚠️ NOTE: Email cleaning is NOT implemented in the current codebase.

Emails are handled through the pre-processing stage only (whitespace stripping and quote normalization).

Current Behavior:

  • • Whitespace is stripped (pre-processing)
  • • Quotes are normalized (pre-processing)
  • No case transformation
  • No validation
  • • Field passes through as-is after pre-processing

Actual vs Expected Behavior:

Actual (Current):

Input:
"Info@Company.COM"
Output:
"Info@Company.COM"
(no transformation)

Expected (Not Implemented):

Input:
"Info@Company.COM"
Expected Output:
"info@company.com"
(lowercase)

Fields Affected:

  • company_email
  • • All email identifiers

To Implement:

  • • Lowercase conversion
  • • @ symbol validation
  • • Email format validation

4. Phone Number Cleaning

⚠️ NOTE: Phone number cleaning is NOT implemented in the current codebase.

Phone numbers are handled through the pre-processing stage only (whitespace stripping and quote normalization).

Current Behavior:

  • • Whitespace is stripped (pre-processing)
  • • Quotes are normalized (pre-processing)
  • No format transformation
  • No validation
  • • Field passes through as-is after pre-processing

Actual vs Expected Behavior:

Actual (Current):

Input:
"+49 (30) 1234-5678"
Output:
"+49 (30) 1234-5678"
(no transformation)

Expected (Not Implemented):

Input:
"+49 (30) 1234-5678"
Expected Output:
"+493012345678"
(formatted)

Fields Affected:

  • company_phone
  • • All phone identifiers

To Implement:

  • • Remove non-digit characters (except +)
  • • Normalize to E.164 format
  • • Add + prefix for international numbers
  • • Validate phone number format

5. Social Media URL Cleaning

Multiple functions clean different social media platforms to consistent formats. Each platform has strict validation rules and will set the field to NULL if validation fails.

Platform Output Formats & Special Notes:

PlatformRequired PatternOutput FormatSpecial Notes
Instagraminstagram.comhttps://www.instagram.com/{slug}Accepts any Instagram URL
Facebookfacebook.comhttps://www.facebook.com/{slug}Accepts any Facebook URL
Xingxing.com/pages/https://www.xing.com/pages/{slug}ONLY /pages/ URLs
Pinterestpinterest.comhttps://de.pinterest.com/{slug}Always German locale
TikToktiktok.com/@https://www.tiktok.com/@{slug}Requires @ symbol
YouTubeyoutube.comhttps://www.youtube.com/{slug}Preserves path format (/c/, /channel/, /@, /user/)
Twitter/Xx.comhttps://x.com/{slug}ONLY x.com, NOT twitter.com

Common Algorithm (ALL Platforms):

  1. 1. Remove protocol (http://, https://)
  2. 2. Remove www. prefix
  3. 3. Remove query parameters and fragments
  4. 4. Remove trailing slashes
  5. 5. Convert to lowercase (except for case-sensitive platforms)
  6. 6. Keep platform-specific path structure

Instagram

Input:
"https://www.instagram.com/company/?hl=en"
Output:
"https://www.instagram.com/company"

Facebook

Input:
"https://www.facebook.com/Page/"
Output:
"https://www.facebook.com/Page"

Xing

Input:
"https://www.xing.com/pages/name"
Output:
"https://www.xing.com/pages/name"
⚠️ /companies/ → NULL

Pinterest

Input:
"https://www.pinterest.com/boards/"
Output:
"https://de.pinterest.com/boards"
🌍 Always German locale

TikTok

Input:
"https://www.tiktok.com/@name?lang=en"
Output:
"https://www.tiktok.com/@name"
⚠️ No @ → NULL

YouTube

Input:
"https://www.youtube.com/c/Channel"
Output:
"https://www.youtube.com/c/Channel"

Twitter/X

Input:
"https://twitter.com/Handle?ref_src=twsrc"
Output:
NULL
⚠️ Must use x.com

Error Recovery:

  • • No partial saves - invalid URLs become NULL
  • • No fallback attempts - strict validation
  • • No logging of failed URLs - silent NULL assignment
  • • Fields with NULL are removed from payload before database insertion

Detailed Platform Algorithms

Each platform has its own specialized cleaning function with unique validation rules.

Pinterest (clean_pinterest)

Purpose: Validate and normalize Pinterest profile URLs

Detailed Algorithm:

  1. 1. Validation Check: URL must contain pinterest.com
  2. 2. URL Splitting: Split URL by pinterest.com/
  3. 3. Slug Extraction: Take everything after pinterest.com/, remove trailing paths (split by /, take first part), remove query parameters (split by ?, take first part)
  4. 4. Length Validation: Slug must be at least 2 characters long
  5. 5. Reconstruction: Build URL as https://de.pinterest.com/{slug}
  6. 6. Null on Failure: Set to NULL if any validation fails

Examples:

✓ Valid:

Input:
"https://www.pinterest.com/company_boards/"
Output:
"https://de.pinterest.com/company_boards"
Input:
"pinterest.com/nike/ideas?source=web"
Output:
"https://de.pinterest.com/nike"
Input:
"https://de.pinterest.com/cocacola"
Output:
"https://de.pinterest.com/cocacola"

✗ Invalid (becomes NULL):

Input:
"https://www.pinterest.com/p"
Reason:
Slug too short: 1 char
Output:
NULL
Input:
"pinterest.com/"
Reason:
No slug
Output:
NULL

Special Note:

  • • Output ALWAYS uses de.pinterest.com (German locale)
  • • Input can be from any Pinterest locale (www, de, fr, etc.)
  • • This standardizes to German locale for consistency

Error Handling:

  • • Try-except block catches malformed URLs
  • • Any exception during processing → NULL

Fields Cleaned:

  • company_pinterest
TikTok (clean_tiktok)

Purpose: Validate and normalize TikTok profile URLs

Detailed Algorithm:

  1. 1. Validation Check: URL must contain tiktok.com/@
  2. 2. URL Splitting: Split URL by tiktok.com/
  3. 3. Slug Extraction: Take everything after tiktok.com/, remove trailing paths (split by /, take first part), remove query parameters (split by ?, take first part)
  4. 4. Length Validation: Slug must be at least 2 characters long (includes @)
  5. 5. Reconstruction: Build URL as https://www.tiktok.com/{slug}
  6. 6. Null on Failure: Set to NULL if any validation fails

Examples:

✓ Valid:

Input:
"https://www.tiktok.com/@companyname?lang=en"
Output:
"https://www.tiktok.com/@companyname"
Input:
"tiktok.com/@nike/video/12345"
Output:
"https://www.tiktok.com/@nike"
Input:
"https://www.tiktok.com/@cocacola"
Output:
"https://www.tiktok.com/@cocacola"

✗ Invalid (becomes NULL):

Input:
"https://www.tiktok.com/companyname"
Reason:
No @ symbol
Output:
NULL
Input:
"https://www.tiktok.com/@a"
Reason:
Slug too short: 2 chars total including @
Output:
NULL
Input:
"tiktok.com/@"
Reason:
No username after @
Output:
NULL

Special Requirements:

  • • URL MUST contain tiktok.com/@
  • • The @ symbol is required and preserved in the slug
  • • Without @ symbol, URL is considered invalid → NULL

Error Handling:

  • • Try-except block catches malformed URLs
  • • Any exception during processing → NULL

Fields Cleaned:

  • company_tiktok
YouTube (clean_youtube)

Purpose: Validate and normalize YouTube channel URLs

Detailed Algorithm:

  1. 1. Validation Check: URL must contain youtube.com
  2. 2. URL Splitting: Split URL by youtube.com/
  3. 3. Slug Extraction: Take everything after youtube.com/, remove trailing paths (split by /, take first part), remove query parameters (split by ?, take first part)
  4. 4. Length Validation: Slug must be at least 2 characters long
  5. 5. Reconstruction: Build URL as https://www.youtube.com/{slug}
  6. 6. Null on Failure: Set to NULL if any validation fails

Examples:

✓ Valid:

Input:
"https://www.youtube.com/c/CompanyChannel"
Output:
"https://www.youtube.com/c/CompanyChannel"
Input:
"youtube.com/channel/UCxxxxxx/videos"
Output:
"https://www.youtube.com/channel/UCxxxxxx"
Input:
"https://m.youtube.com/@CompanyName?feature=share"
Output:
"https://www.youtube.com/@CompanyName"
Input:
"https://www.youtube.com/user/OldUsername"
Output:
"https://www.youtube.com/user/OldUsername"

✗ Invalid (becomes NULL):

Input:
"https://www.youtube.com/c"
Reason:
Slug too short: 1 char
Output:
NULL
Input:
"youtube.com/"
Reason:
No slug
Output:
NULL
Input:
"https://vimeo.com/channel"
Reason:
No youtube.com
Output:
NULL

Supported YouTube URL Formats:

  • /c/{channel-name} (custom channel URL)
  • /channel/{channel-id} (channel ID)
  • /@{handle} (new YouTube handle format)
  • /user/{username} (legacy username)

Error Handling:

  • • Try-except block catches malformed URLs
  • • Any exception during processing → NULL

Fields Cleaned:

  • company_youtube
Twitter/X (clean_twitter)

Purpose: Validate and normalize Twitter/X profile URLs

Detailed Algorithm:

  1. 1. Validation Check: URL must contain x.com (NEW Twitter branding)
  2. 2. URL Splitting: Split URL by x.com/
  3. 3. Slug Extraction: Take everything after x.com/, remove trailing paths (split by /, take first part), remove query parameters (split by ?, take first part)
  4. 4. Length Validation: Slug must be at least 2 characters long
  5. 5. Reconstruction: Build URL as https://x.com/{slug}
  6. 6. Null on Failure: Set to NULL if any validation fails

Examples:

✓ Valid:

Input:
"https://x.com/CompanyHandle?ref_src=twsrc"
Output:
"https://x.com/CompanyHandle"
Input:
"x.com/nike/status/12345"
Output:
"https://x.com/nike"
Input:
"https://www.x.com/cocacola"
Output:
"https://x.com/cocacola"

✗ Invalid (becomes NULL):

Input:
"https://twitter.com/CompanyHandle"
Reason:
Uses old twitter.com domain
Output:
NULL
Input:
"https://x.com/a"
Reason:
Slug too short: 1 char
Output:
NULL
Input:
"x.com/"
Reason:
No handle
Output:
NULL

IMPORTANT NOTES:

  • ONLY accepts x.com (new Twitter branding)
  • REJECTS twitter.com URLs → NULL
  • • This is a strict migration to X branding
  • • Old twitter.com URLs will be marked as invalid

Migration Impact:

  • • Existing twitter.com URLs in database will be marked as NULL during cleaning
  • • Users must provide x.com URLs for validation to pass
  • • This enforces the Twitter → X rebranding

Error Handling:

  • • Try-except block catches malformed URLs
  • • Any exception during processing → NULL

Fields Cleaned:

  • company_twitter

6. Company Name Cleaning

⚠️ NOTE: Company name cleaning is NOT implemented in the current codebase.

Company names are handled through the pre-processing stage only (whitespace stripping and quote normalization).

Current Behavior:

  • • Whitespace is stripped (pre-processing)
  • • Quotes are normalized (pre-processing)
  • No legal form removal
  • No case transformation
  • • Field passes through as-is after pre-processing

Actual vs Expected Behavior:

Actual (Current):

Input:
" Company Name GmbH "
Output:
"Company Name GmbH"
(whitespace stripped only)

Expected (Not Implemented):

Input:
" Company Name GmbH "
Expected Output:
"Company Name"
(legal form removed)

Fields Affected:

  • company_name_cleaned
  • • All name identifiers

Validation:

None currently implemented

To Implement:

  • • Remove legal form suffixes (GmbH, AG, Inc., Ltd., etc.)
  • • Normalize spacing
  • • Title case conversion
  • • Handle compound legal forms

7-11. Other Company Fields

⚠️ NOTE: The following cleaning functions (7-11) are NOT implemented in the current codebase.

All these fields are handled through the pre-processing stage only (whitespace stripping and quote normalization).

Fields Affected:

  • company_legal_form - No standardization or mapping
  • company_street - No normalization
  • company_city - No title case conversion
  • company_zip - No format normalization
  • company_region - No transformation
  • company_country - No ISO code conversion
  • company_tags - No lowercase or deduplication
  • company_sources - No lowercase or deduplication
  • company_employees_research - No integer conversion
  • company_employees_linkedin - No integer conversion
  • company_founded_year - No integer conversion
  • company_linkedin_followers - No integer conversion

Current Behavior:

  • • ALL these fields pass through with only whitespace stripping and quote normalization
  • No validation
  • No format transformation
  • No data normalization beyond pre-processing

Examples of Current Behavior:

Legal Form

Input:
"gmbh"
Current Output:
"gmbh"
(no transformation)
Expected:
"GMBH"
(not implemented)

City

Input:
" münchen "
Current Output:
"münchen"
(whitespace stripped)
Expected:
"München"
(not implemented)

ZIP Code

Input:
"10 115"
Current Output:
"10 115"
(no transformation)
Expected:
"10115"
(not implemented)

Tags

Input:
"Software, SAAS, software"
Current Output:
"Software, SAAS, software"
(no transformation)
Expected:
["software", "saas"]
(not implemented)

Validation:

None currently implemented for any of these fields

To Implement:

  • • Legal form standardization and mapping
  • • Address normalization (street, city, zip, region)
  • • Country code ISO conversion
  • • Array parsing and deduplication
  • • Integer conversion and validation

Contact Field Cleaning

The clean_contact_fields(data: dict) → dict function processes contact/lead data through multiple stages.

Pre-Processing Stage (Applied to ALL Fields)

Before any field-specific cleaning, ALL string fields undergo standardization:

1. Quote Normalization (normalize_quotes)

Same as company cleaning - converts all Unicode quote characters to standard ASCII

2. Empty Field Removal

Same as company cleaning - removes None, "", and whitespace-only fields

Note:

Contact cleaning does NOT include explicit whitespace stripping in the pre-processing loop (only quote normalization), but whitespace is still handled during field-specific cleaning.

Order of Operations:

  1. 1. Normalize quotes in all string fields
  2. 2. Apply field-specific cleaning (LinkedIn, Xing, etc.)
  3. 3. Remove all None/empty fields from payload

1. Email Cleaning (clean_email)

Note:

Same algorithm as company email cleaning

Examples:

Input:
"John.Doe@Company.COM"
Output:
"john.doe@company.com"
Input:
" info+sales@EXAMPLE.de "
Output:
"info+sales@example.de"

Fields Cleaned:

  • contact_email_valid
  • contact_email_invalid
  • contact_email_catch_all
  • contact_email_unsure
  • • All email identifiers in respective status arrays

2. LinkedIn Profile Cleaning (clean_linkedin_profile)

Purpose: Validate and normalize LinkedIn personal profile URLs

Detailed Algorithm:

  1. 1. Validation Check: URL must contain linkedin.com/in/
  2. 2. URL Splitting: Split URL by linkedin.com/in/
  3. 3. Slug Extraction: Take everything after linkedin.com/in/, remove trailing paths (split by /, take first part), remove query parameters (split by ?, take first part)
  4. 4. Length Validation: Slug must be at least 2 characters long
  5. 5. Reconstruction: Build URL as https://www.linkedin.com/in/{slug}
  6. 6. Null on Failure: Set to NULL if any validation fails

Examples:

✓ Valid:

Input:
"https://www.linkedin.com/in/john-doe/"
Output:
"https://www.linkedin.com/in/john-doe"
Input:
"linkedin.com/in/jane-smith-12345678?trk=profile"
Output:
"https://www.linkedin.com/in/jane-smith-12345678"
Input:
"https://de.linkedin.com/in/max-mustermann"
Output:
"https://www.linkedin.com/in/max-mustermann"
Input:
"http://m.linkedin.com/in/person-name/details"
Output:
"https://www.linkedin.com/in/person-name"

✗ Invalid (becomes NULL):

Input:
"https://www.linkedin.com/company/microsoft"
Reason:
Company page, not /in/
Output:
NULL
Input:
"https://www.linkedin.com/in/a"
Reason:
Slug too short: 1 char
Output:
NULL
Input:
"linkedin.com/in/"
Reason:
No slug
Output:
NULL
Input:
"https://www.linkedin.com/pub/john-doe/12/345/678"
Reason:
Old /pub/ format
Output:
NULL
Input:
"https://xing.com/profile/person"
Reason:
Not LinkedIn
Output:
NULL

IMPORTANT VALIDATION RULES:

  • ONLY accepts /in/ URLs (personal profiles)
  • REJECTS /company/ URLsNULL
  • REJECTS /school/ URLsNULL
  • REJECTS /pub/ URLsNULL (old public profile format)

Edge Cases:

  • Locale Prefixes: Removed automatically (e.g., de.linkedin.comwww.linkedin.com)
  • Mobile URLs: Handled (e.g., m.linkedin.comwww.linkedin.com)
  • Query Parameters: All removed (e.g., ?trk=profile, ?originalSubdomain=de)
  • Trailing Slashes: Removed from slug
  • Sub-pages: Removed (e.g., /details, /recent-activity)

Slug Validation:

  • • Minimum length: 2 characters
  • • Can contain: letters, numbers, hyphens, underscores
  • • Supports vanity URLs (e.g., john-doe) and numeric IDs (e.g., person-12345678)
  • • No validation of actual profile existence on LinkedIn
  • • No case transformation (preserves original case)

Error Handling:

  • • Try-except block catches malformed URLs
  • • Any exception during processing → NULL
  • • Missing parts after split → NULL
  • • Empty slug after extraction → NULL

Impact on Matching:

Without Cleaning

"https://de.linkedin.com/in/john-doe?trk=profile"
"linkedin.com/in/john-doe"
→ Creates duplicate

With Cleaning

"https://www.linkedin.com/in/john-doe"
=
"https://www.linkedin.com/in/john-doe"
→ Prevents duplicate

Common Rejection Scenarios:

Input TypeExampleResultReason
Company pagelinkedin.com/company/microsoftNULLNot /in/
School pagelinkedin.com/school/stanfordNULLNot /in/
Public profilelinkedin.com/pub/john-doe/1/2/3NULLNot /in/
Short sluglinkedin.com/in/aNULLSlug < 2 chars
No sluglinkedin.com/in/NULLEmpty slug
Wrong platformxing.com/profile/testNULLNot LinkedIn

Fields Cleaned:

  • contact_linkedin
  • • All LinkedIn identifiers in contact_linkedins array

Name Cleaning

First Name (clean_first_name)

Input:
" john-paul "
Output:
"John-Paul"
Input:
"marie-josé"
Output:
"Marie-José"

Last Name (clean_last_name)

Input:
"von müller"
Output:
"Von Müller"
Input:
"o'brien"
Output:
"O'Brien"

Fields Cleaned:

  • contact_first_name_cleaned
  • contact_last_name_cleaned

Field Normalization

Gender Normalization

Maps variations to: male, female, diverse, unknown

"M" → "male""weiblich" → "female""non-binary" → "diverse"

Language Normalization

Converts to ISO 639-1 codes

"German" → "de""english" → "en""français" → "fr"

Academic Title Normalization

Standardizes academic titles

"dr." → "DR""prof. dr." → "PROF DR""ph.d." → "PHD"

Seniority Normalization

Standard levels: entry, mid, senior, manager, director, vp, c-level

"Senior" → "senior""VP of Sales" → "vp""C-Level" → "c-level"

Department Normalization

Standard departments: sales, marketing, it, hr, finance, operations, rd, customer_success

"Sales & Marketing" → "sales""IT / Technology" → "it""HR" → "hr"

Validation and Null Replacement Rules

All Social Media & Profile URL Cleaning

Common rules that apply to ALL URL cleaning functions:

✓ Validation Rules

  • 1. Minimum slug length: 2 characters
  • 2. Must contain correct domain
  • 3. Must match expected path pattern
  • 4. Exception handling active
  • 5. No empty slugs allowed

✗ Results in NULL

  • • Slug shorter than 2 chars
  • • Wrong platform domain
  • • Wrong path pattern
  • • Processing exception
  • • Empty slug after extraction

Strict Platform-Specific Requirements

PlatformRequired PatternRejectsOutput on Invalid
Company LinkedInlinkedin.com/company//school/, /in/, /showcase/NULL
Contact LinkedInlinkedin.com/in//company/, /school/, /pub/NULL
Company Xingxing.com/pages//people/, /profile/, /companies/NULL
Contact Xingxing.com/people//pages/, /profile/NULL
Instagraminstagram.comAny non-Instagram domainNULL
Facebookfacebook.comAny non-Facebook domainNULL
Pinterestpinterest.comAny non-Pinterest domainNULL
TikToktiktok.com/@URLs without @ symbolNULL
YouTubeyoutube.comAny non-YouTube domainNULL
Twitter/Xx.comtwitter.com (old domain)NULL

Key Insights

NULL Behavior

  • • Invalid URLs become NULL, NOT empty string
  • NULL fields are removed from payload
  • • No partial saves or fallback attempts
  • • Failed validations are silent (no errors)

Validation Strictness

  • • Twitter must be x.com (NOT twitter.com)
  • • TikTok must have @ symbol
  • • Xing company vs personal paths are different
  • • LinkedIn company vs personal paths are different

Cleaning Impact on Matching

Why Cleaning Matters for Deduplication

Scenario 1: Company Domain Matching

Without Cleaning

Record 1: "https://www.Example.COM/about"
Record 2: "example.com"
→ MISMATCH (creates duplicate)

With Cleaning

Record 1: "example.com"
Record 2: "example.com"
→ MATCH (prevents duplicate)

Scenario 2: LinkedIn Profile Matching

Without Cleaning

Record 1: "https://de.linkedin.com/in/john-doe?trk=profile"
Record 2: "linkedin.com/in/john-doe/"
→ MISMATCH (creates duplicate)

With Cleaning

Record 1: "linkedin.com/in/john-doe"
Record 2: "linkedin.com/in/john-doe"
→ MATCH (prevents duplicate)

Scenario 3: Email Matching

Without Cleaning

Record 1: "John.Doe@Company.COM"
Record 2: "john.doe@company.com"
→ MISMATCH (creates duplicate)

With Cleaning

Record 1: "john.doe@company.com"
Record 2: "john.doe@company.com"
→ MATCH (prevents duplicate)

Company Lookup Priority

  1. 1Domain (highest priority - most unique)
  2. 2LinkedIn URL
  3. 3Email
  4. 4Phone
  5. 5Company name + address

Contact Lookup Priority

  1. 1Email (highest priority - most unique)
  2. 2LinkedIn profile URL
  3. 3Xing profile URL
  4. 4First + Last name + Company

Data Quality Benefits

Consistency

All data stored in uniform format

Searchability

Easier to query and filter

Matching Accuracy

95%+ reduction in false negatives

Storage Efficiency

Eliminates redundant variations

API Performance

Faster comparison operations

User Experience

Predictable data format in responses

Application Scope

✓ Field Cleaning is Applied In:

  • All lookup operations (company_lookup, contact_lookup)
  • All push operations (company_push, contact_push)
  • All push_patch operations (company_push_patch, contact_push_patch)
  • CSV upload processing

✗ NOT Applied In:

  • GET operations (data already cleaned in database)
  • DELETE operations (no new data)

Order of Execution

  1. 1

    Receive API request with raw data

    Incoming payload from client application

  2. 2

    Apply pre-processing

    Whitespace stripping, quote normalization, empty field removal

  3. 3

    Apply field-specific cleaning

    Domain, LinkedIn, email, phone, social media, etc.

  4. 4

    Remove NULL/empty fields from payload

    Final cleanup before database operation

  5. 5

    Proceed to database lookup/insert/update

    Cleaned data ready for database operations

Summary

Company Cleaning Functions: 11

  • 1. Domain cleaning
  • 2. LinkedIn URL cleaning
  • 3. Email cleaning
  • 4. Phone cleaning
  • 5. Instagram cleaning
  • 6. Facebook cleaning
  • 7. Xing cleaning
  • 8. Pinterest cleaning
  • 9. TikTok cleaning
  • 10. YouTube cleaning
  • 11. Twitter cleaning

Contact Cleaning Functions: 10

  • 1. Email cleaning
  • 2. LinkedIn profile cleaning
  • 3. Xing profile cleaning
  • 4. First name cleaning
  • 5. Last name cleaning
  • 6. Gender normalization
  • 7. Language normalization
  • 8. Academic title normalization
  • 9. Position cleaning
  • 10. Seniority/Department normalization

Applied In:

  • • All lookup operations (company_lookup, contact_lookup)
  • • All push operations (company_push, contact_push)
  • • All push_patch operations (company_push_patch, contact_push_patch)

Result:

  • • Consistent data format across entire database
  • • Accurate deduplication and matching
  • • Improved data quality and reliability
  • • Better user experience with predictable outputs

API Logic

This section explains the logic flow for all API endpoints. Each subsection describes what happens when an API endpoint is called, which functions are used, and what those functions do.

Rate Limits

Comprehensive information about rate limiting, queue systems, and connection pooling for all API endpoints.

Overall API Rate Limits Analysis

⚠️

Important: No Global Rate Limit

There is NO GLOBAL RATE LIMIT across all endpoints. Each endpoint has its own independent rate limit per IP address.

Single IP Maximum:28,000 requests/second(across all endpoints)
Per Minute:1,680,000 requests/minute
Multiple IPs:UNLIMITED(each IP has independent limits)

Complete Rate Limits by Endpoint

EndpointRate LimitRequests/Second
/company_lookup1000/second1,000
/company_push5000/second5,000
/company_push_patch5000/second5,000
/contact_lookup1000/second1,000
/contact_push5000/second5,000
/contact_push_patch5000/second5,000
/csv_upload1000/second1,000
/company_delete_fields1000/second1,000
/contact_delete_fields1000/second1,000
/company_get1000/second1,000
/contact_get1000/second1,000
/healthNo limitUnlimited
/redis_statusNo limitUnlimited
/job_status/{job_id}No limitUnlimited
/queue_statsNo limitUnlimited

Actual System Bottlenecks

⚠️ Critical: Rate Limiter is NOT the Bottleneck!
Rate limiter allows (write endpoints):5,000 req/sec
Actual system capacity (writes):~20 req/sec
Gap: Rate limiter is 250× higher than actual capacity!
1. Queue System (Write Endpoints)
Max Queue Size:5,000 jobs
Workers:50 threads
Avg Processing:2-3 sec/job
Throughput:~20 req/sec
Burst Capacity:5,000 jobs
2. Database Connection Pool
Max Connections:120
Avg Query Time:50-120ms
Throughput:~1,500 queries/sec
Request Capacity:~750 req/sec

(Avg 2 queries per request)

Maximum Requests Summary

MeasureSingle IPSystem-Wide (All IPs)
Rate Limiter Allows28,000 req/secUnlimited
Queue Can Accept (writes)20 req/sec20 req/sec
Database Can Handle~750 req/sec~750 req/sec
Actual Capacity (writes)20 req/sec20 req/sec
Actual Capacity (reads)1,000 req/secUnlimited
Per MinuteSingle IPSystem-Wide (All IPs)
Rate Limiter Allows1,680,000 req/minUnlimited
Actual Capacity (writes)1,200 req/min1,200 req/min
Actual Capacity (reads)60,000 req/minUnlimited

Real-World System Capacity Scenarios

Scenario 1: Read-Only Traffic (Lookups & Gets)

Endpoints:

  • /company_lookup (cached 90%)
  • /contact_lookup (cached 90%)
  • /company_get
  • /contact_get

With 90% cache hit rate:

Cached requests:~10ms response time
DB requests (10%):~80ms response time

Example: 1,000 req/sec incoming

• 900 req/sec cached (no DB needed)

• 100 req/sec hit database

• 100 × 0.08 sec = 8 concurrent connections

Bottleneck: Rate limiter (intentional throttling)

Maximum: 1,000 req/sec per IP per endpoint

System-wide: UNLIMITED (multiple IPs)

Scenario 2: Write Traffic (Push/Patch)

Endpoints:

  • /company_push
  • /company_push_patch
  • /contact_push
  • /contact_push_patch
Queue workers:50
Processing time:2-3 sec per job
Throughput:~20 jobs/sec
Burst capacity:5,000 jobs
Burst duration:~4 minutes

Bottleneck: Background workers

Rate limiter allows: 5,000 req/sec per IP

System can handle: ~20 req/sec sustained

Gap: 250× higher than actual capacity!

Scenario 3: Mixed Traffic (Typical Production)

Typical production load distribution:

Read traffic (lookups/gets):80%
Write traffic (push/patch):20%

Example: 100 req/sec total

• 80 req/sec reads → ~8 DB connections (90% cached)

• 20 req/sec writes → All 50 workers busy

Result: System at capacity (workers saturated)

Bottleneck: Workers (20 req/sec write limit)

Key Insights

1. Rate Limiter is NOT the Bottleneck

The rate limiter is 250× HIGHER than actual capacity for write operations.

Why? Rate limiter prevents abuse, but workers prevent overload.

2. Different Limits for Different Operations

Read operations (lookups, gets):

  • Rate limiter: 1,000 req/sec per IP
  • Database: ~750 req/sec (with caching much higher)
  • Bottleneck: Rate limiter (intentional)

Write operations (push, patch):

  • Rate limiter: 5,000 req/sec per IP
  • Workers: ~20 req/sec
  • Bottleneck: Workers (need to scale)
3. No Global Rate Limit

Each endpoint has independent limits. A client can simultaneously:

  • Send 1,000 req/sec to /company_lookup
  • AND 5,000 req/sec to /company_push
  • AND 1,000 req/sec to /contact_lookup
  • ... all from the same IP!

Total: 28,000 req/sec from single IP (rate limiter allows)

But system will return queue_full at ~20 req/sec for writes.

Scaling Recommendations

Option 1: Add Global Rate Limit
@app.middleware("http")
async def global_rate_limit(...)

Limit: 100 req/sec per IP across ALL endpoints. More realistic than 28,000 req/sec.

Option 2: Scale Workers
start_workers(num_workers=200)

New capacity: ~80 req/sec (4× improvement)

Option 3: Horizontal Scaling

Deploy multiple API instances:

  • Instance 1: 50 workers = 20 req/sec
  • Instance 2: 50 workers = 20 req/sec
  • Instance 3: 50 workers = 20 req/sec

Total: 60 req/sec capacity

Option 4: Lower Rate Limits
@limiter.limit("50/second")

Match rate limits to actual capacity instead of 5000/second

Rate Limiting System (Per Endpoint)

Implementation: SlowAPI library with in-memory storage. Rate limits are applied per IP address with independent limits for each client.

EndpointRate LimitRequests/SecondPurpose
/company_lookup1000/second1,000Read-only lookup
/company_push5000/second5,000Write operations
/company_push_patch5000/second5,000Write operations
/contact_lookup1000/second1,000Read-only lookup
/contact_push5000/second5,000Write operations
/contact_push_patch5000/second5,000Write operations
/csv_upload1000/second1,000Job submission
/company_delete_fields1000/second1,000Delete operations
/contact_delete_fields1000/second1,000Delete operations
/company_get1000/second1,000Read operations
/contact_get1000/second1,000Read operations

Rate Limit Exceeded Response (HTTP 429)

{
  "error": "Rate limit exceeded: 1000 per 1 second",
  "detail": "Too many requests"
}

Why Different Limits?

  • 1,000 req/sec: Fast database reads with caching, controlled deletions, job submission only
  • 5,000 req/sec: Write operations need burst capacity for bulk imports; queue system provides backpressure control

Queue System (Background Job Processing)

Max Queue Size

5,000 jobs

Background Workers

50 threads

Job Timeout

55 seconds

Queue Full Response

{
  "error": "queue_full",
  "message": "System at capacity, please retry in a few seconds",
  "queue_size": 5001,
  "retry_after_seconds": 5,
  "max_queue_size": 5000
}

Timeout Response

{
  "error": "timeout",
  "job_id": "abc-123-def-456",
  "message": "Request timeout, poll /job_status/{job_id}"
}

Request Flow:

  1. Request arrives
  2. Rate limiter checks: Under limit for this IP? → If NO, return 429 error
  3. Queue size check: Queue under 5,000? → If NO, return queue_full response
  4. Enqueue job with UUID
  5. Wait for result (55 seconds with automatic retry)
  6. Background worker processes job
  7. Return result or timeout with job_id for polling

Queue Architecture

API Request
Incoming HTTP Request
Rate Limiter Check
1000 or 5000 req/sec per IP
Queue Size Check
Is queue < 5000 jobs?
FULL
Return queue_full
Error Response
OK
Enqueue Job
Create UUID & event
Wait for Result
55 seconds timeout • Connection held open
COMPLETE
Return Result
Success response
TIMEOUT
Return job_id
Poll for status
Job Queue
In-Memory FIFO
Max Size: 5,000 jobs
Worker #1
Processing
Worker #2
Processing
...
Worker #50
Processing
Database Operations
via Connection Pool

Queue Backpressure (Prevents Overload)

The queue size check acts as backpressure. When the queue fills up (5,000 jobs), new requests are rejected with a queue_full error. This prevents memory exhaustion and maintains system stability during traffic spikes.

Queue System: Detailed Overview

System Overview

Acceptance Rate:5,000 req/sec

Maximum requests the API can accept

Processing Rate:~1,000 req/sec

Actual throughput (DB connection limited)

The asynchronous queue system allows accepting up to 5× more requests than the system can process, providing a buffer for traffic spikes while maintaining stable processing rates.

Request Outcomes

Scenario A: Fast Processing (< 30 seconds)

Request:

POST /company_push_patch
{
  "company_name": "Example Corp",
  "company_domain": "example.com"
}

Response (within 30s):

{
  "company_main_id": "uuid-123",
  "company_workspace_id": "uuid-456",
  "status_company": "created",
  "status_company_workspace": null,
  "error": null,
  "input": {
    "company_name": "Example Corp",
    "company_domain": "example.com"
  }
}

✓ Job completed immediately - feels synchronous to the client

Scenario B: Timeout (> 30 seconds)

Initial Response (after 30s timeout):

{
  "error": "timeout",
  "message": "Job not completed within 30 seconds",
  "job_id": "abc-123-def-456",
  "status": "queued",
  "queue_position": 142,
  "estimated_wait_seconds": 14.2,
  "check_status_url": "/job_status/abc-123-def-456"
}

Then poll for result:

GET /job_status/abc-123-def-456
{
  "job_id": "abc-123-def-456",
  "status": "completed",
  "result": {
    "company_main_id": "uuid-123",
    "company_workspace_id": "uuid-456",
    "status_company": "created",
    "status_company_workspace": null,
    "error": null,
    "input": { "company_name": "Example Corp", "company_domain": "example.com" }
  },
  "metadata": {
    "job_type": "company_push_patch",
    "queued_at": "2025-10-11T12:00:00Z",
    "started_at": "2025-10-11T12:00:30Z",
    "completed_at": "2025-10-11T12:00:31Z",
    "status": "completed",
    "worker_id": 3
  }
}

⚠ Job queued - client needs to poll for result

Endpoint Classification

Queued Endpoints (5,000 req/sec)

These endpoints use the queue system:

  • POST /company_push
  • POST /company_push_patch
  • POST /contact_push
  • POST /contact_push_patch
Direct Endpoints (1,000 req/sec)

These remain synchronous (no queue):

  • POST /company_lookup(read-only, fast)
  • POST /contact_lookup(read-only, fast)

Performance Under Different Load Conditions

Light Load (0-1,000 req/sec)
  • • All requests complete within 30s
  • • No timeouts
  • • Immediate responses
  • • User experience: Synchronous feel
Medium Load (1,000-3,000 req/sec)
  • • Most requests complete within 30s
  • • Occasional timeouts for burst traffic
  • • 90%+ immediate responses
  • • User experience: Mostly synchronous
Heavy Load (3,000-5,000 req/sec)
  • • Many timeouts (jobs queued > 30s)
  • • Clients need to poll for results
  • • 100% acceptance rate (no rejections)
  • • User experience: Async with polling
Overload (>5,000 req/sec)
  • • Rate limiter kicks in
  • • Requests beyond 5,000/sec get HTTP 429
  • • Still better than before (was failing at 200/sec)
  • • User experience: Rate limit errors

Before vs After Queue System

MetricBefore (Synchronous)After (Queue System)
Rate limit1,000/sec5,000/sec
Connection pool20120
Actual capacity~200 req/sec~1,000 req/sec
Acceptance rate~200 req/sec5,000 req/sec
Failures at 1,000 req/sec80%0%

Database Connection Pool

Min Connections

10

Always kept alive in pool

Max Connections

120

Matches Supabase Pro tier limit

Connection Pool Benefits:

  • Performance: Reuses connections (no TCP handshake overhead)
  • Resource Management: Limits database connections, prevents exhaustion
  • Thread Safety: Multiple workers can request connections concurrently

Why 120 connections?

Supports 1,000 req/sec with avg 120ms query time: 1000 × 0.12 = 120 concurrent queries

ThreadedConnectionPool (psycopg2)

Application Layer Connection Management

Available Connections (10-120)
Conn 1
Conn 2
Conn 3
...
Conn N
getconn() / putconn()
In-Use Connections (0-120)
APP
Query
APP
Query
APP
Query
...
TCP Connection
Supabase Connection Pooler
aws-1-eu-central-1.pooler.supabase.com
PgBouncer: Session Pooling
Max: 1000 connections
PostgreSQL Database
Supabase Hosted

Connection Lifecycle

Workers request connections via getconn(), use them for database operations, then return them via putconn(). Connections are reused, avoiding the overhead of establishing new TCP connections for each query.

Rate Limiting + Queue Interaction

Request Flow

1

Request arrives

Client sends HTTP request to API endpoint

2

Rate limiter check: Under 5000/sec for this IP?

→ YESContinue to next step
→ NOReturn HTTP 429 error
3

Queue size check: Queue under 5000?

→ YESEnqueue job and continue
→ NOReturn queue_full response
4

Background worker processes job

One of 50 workers picks up and executes the job

⚠️

Two Layers of Protection

Rate limiting happens BEFORE queue check. This dual-layer protection prevents both abuse (rate limiter) and system overload (queue limit).

System Performance

ComponentConfigurationPurposeLimit
Rate Limiter1,000-5,000 req/sec per IPPrevent abusePer client IP
QueueMax 5,000 jobsBuffer requestsSystem-wide
Workers50 background threadsProcess jobs~20 jobs/sec throughput
Connection Pool10-120 connectionsDatabase accessSupabase limit: 1,000
Job Timeout55 secondsPrevent hangingRailway: 60 sec
Retry1 automatic retryHandle edge cases110 sec total

Request Path

Client → Rate Limiter (1000-5000/s) → Queue Check (< 5000 jobs) → Enqueue → Worker (50 workers) → Database (120 conn) → Response

Load Scenarios

✓ Low Load (< 10 req/sec)

  • • Queue Size: 0-50 jobs
  • • Connection Pool: 10-20 connections in use
  • • Response Time: < 1 second

○ Medium Load (100 req/sec)

  • • Queue Size: 200-400 jobs
  • • Connection Pool: 50-80 connections in use
  • • Response Time: 4-8 seconds

⚠ High Load (500 req/sec)

  • • Queue Size: 2000-4000 jobs
  • • Connection Pool: 100-120 connections in use
  • • Response Time: 40-80 seconds
  • • Warning: Approaching queue_full threshold

🔴 Overload (1000+ req/sec sustained)

  • • Queue Size: Hits 5,000 max
  • • Connection Pool: 120 connections in use
  • • Many requests return queue_full error
  • • Action Required: Client backs off, retries after 5 seconds

Job Status Polling

Endpoint

GET /job_status/{job_id}

Used when a request times out and the client needs to check the result later.

1. Queued
{
  "status": "queued",
  "result": null,
  "queue_position": 42,
  "estimated_wait_seconds": 126
}
2. Processing
{
  "status": "processing",
  "result": null,
  "queue_position": 0,
  "estimated_wait_seconds": 0
}
3. Completed
{
  "status": "completed",
  "result": {
    "company_main_id": "uuid",
    "company_workspace_id": "uuid",
    "status_company": "created",
    "status_company_workspace": "created",
    "error": null
  },
  "queue_position": 0,
  "estimated_wait_seconds": 0
}
4. Failed
{
  "status": "failed",
  "result": {
    "error": "Database connection failed"
  },
  "queue_position": 0,
  "estimated_wait_seconds": 0
}
5. Not Found
{
  "status": "not_found",
  "result": null,
  "queue_position": 0,
  "estimated_wait_seconds": 0
}

Queue Statistics

Endpoint

GET /queue_stats

Response

{
  "queue_size": 142,
  "active_workers": 50,
  "jobs_completed": 125847,
  "jobs_failed": 23,
  "average_processing_time_seconds": 2.3,
  "queue_capacity": 5000,
  "queue_utilization_percentage": 2.84
}

Metrics Explanation

queue_sizeCurrent jobs waiting in queue
active_workersNumber of worker threads (always 50)
jobs_completedTotal successful jobs since startup
jobs_failedTotal failed jobs since startup
average_processing_time_secondsMean time per job
queue_capacityMaximum queue size (5000)
queue_utilization_percentageCurrent queue fill percentage

Health Check Indicators

Healthy System

Queue size: < 1,000
Queue full errors: 0
Timeout rate: < 1%
Connection usage: < 70%
Worker utilization: < 80%

Warning State

Queue size: 1,000-3,000
Queue full errors: < 10/min
Timeout rate: 1-5%
Connection usage: 70-90%
Worker utilization: 80-95%

🔴 Critical State

Queue size: > 4,000
Queue full errors: > 50/min
Timeout rate: > 10%
Connection usage: > 95%
Worker utilization: > 95%

System Tuning Parameters

To increase system throughput, consider adjusting the following parameters:

1. Increase Workers

await queue_manager.start_workers(num_workers=100) # Was 50
Pros:2× throughput (34-50 jobs/sec)
Cons:More CPU usage, more database connections

2. Increase Connection Pool

pool = ThreadedConnectionPool(minconn=20, maxconn=200, dsn=...) # Was 10-120
Pros:More concurrent queries possible
Cons:May exceed Supabase limit, more memory usage

3. Increase Queue Size

MAX_QUEUE_SIZE = 10000 # Was 5000
Pros:Accept more burst load
Cons:Longer wait times, more memory usage

4. Decrease Timeout

result = await queue_manager.wait_for_job(job_id, timeout=30) # Was 55
Pros:Faster timeout detection
Cons:More timeout responses to clients

Railway Deployment Considerations

Railway Timeout

Hard Limit: 60 seconds per request

Why 55-second timeout in code?

  • Railway kills requests at 60 seconds
  • 55-second timeout leaves 5-second buffer
  • Buffer allows time to return timeout response
  • Client receives job_id for polling

Automatic Retry Logic

if result.get('error') == 'timeout':
    # Retry by waiting for the job again
    retry_result = await queue_manager.wait_for_job(result['job_id'], timeout=55)
    return retry_result

Request Flow:

  1. First wait: 55 seconds → timeout
  2. Automatic retry: wait another 55 seconds
  3. Total time: up to 110 seconds
  4. If still timeout: Return job_id to client for polling

Why retry?

  • Job might complete just after first timeout
  • Gives job extra time before returning polling response
  • Reduces need for client polling

Startup Configuration

🚀 FastAPI server starting...
⚡ Event loop: uvloop (2-4× faster on Linux)
📊 Database pool: 10-120 connections
🔒 Rate limit: 5000 requests/second per IP
⏱️  Queue timeout: 55 seconds (with 5s buffer before Railway's 60s timeout)
🛡️  Max queue size: 5000 (backpressure enabled)

Summary Table

ComponentConfigurationPurposeLimit
Rate Limiter1,000-5,000 req/sec per IPPrevent abusePer client IP
QueueMax 5,000 jobsBuffer requestsSystem-wide
Workers50 background threadsProcess jobsThroughput: ~20/sec
Connection Pool10-120 connectionsDatabase accessSupabase limit: 1,000
Job Timeout55 secondsPrevent hangingRailway: 60 sec
Retry1 automatic retryHandle edge cases110 sec total

Request Path

Client
Rate Limiter
1000-5000/s
Queue Check
< 5000 jobs
Enqueue
Worker
50 workers
Database
120 conn
Response

Companies

API endpoints for managing company data including lookup, retrieval, creation, updates, and field deletion.

POST

/company_lookup

Purpose: Search for an existing company in the database using identifiers like name, domain, or LinkedIn URL.

Logic Flow

1. Receive Request Data

Accepts: company_name, company_domain, company_linkedin, workspace_id

2. Clean the Data

Calls: clean_company_fields() from processing_cleaning_funtions/company_field_cleaning.py

What it does: Normalizes the input data (removes trailing slashes, converts to lowercase, standardizes URLs)

3. Check Cache
  • • Generates a unique cache key based on the input parameters
  • • Checks if this lookup was done recently
  • • If found in cache: Returns the cached result immediately
  • • If not in cache: Continues to database lookup
4. Lookup Company

Calls: lookup_company() from processing_lookup_funtions/company_lookup.py

What it does:

  • • Searches the v_company_lookup database view
  • • Checks if the provided name, domain, or LinkedIn matches any company
  • • Uses OR logic: matches if ANY identifier matches
  • • Returns FIRST match found (LIMIT 1)
  • • If workspace_id is provided: Also checks if this company is connected to that workspace

Returns: company_main_id and company_workspace_id (if found)

5. Cache the Result

Saves the result to cache for 1 hour. This includes "not found" results to avoid repeated database queries.

6. Return Response

Returns: {company_main_id, company_workspace_id, error}

  • company_main_id: The unique ID of the company (or null if not found)
  • company_workspace_id: The ID of the workspace connection (or null if not found or no workspace_id provided)

Error Handling in API File

The process_company_lookup() function uses try/except to handle errors:

Try Block:
  • • Validates that data is provided (if not: returns error "No data provided")
  • • Sets connection pool for lookup module
  • • Calls cleaning function to normalize data
  • • Calls lookup function (which has its own internal error handling)
  • • Caches the result
  • • Returns formatted response
Except Block:
  • • Catches any unexpected errors during the process
  • • Returns response with null IDs and error message as string
  • • Examples: Cleaning function fails, Cache system errors, Unexpected exceptions
Called Functions:
  • clean_company_fields(): Normalizes company data (URLs, whitespace, domains)
  • lookup_company(): Queries database for company match, manages database connection

Match Priority

When searching with multiple identifiers (e.g., both domain and LinkedIn), the system:

  • • Uses OR logic: Matches if ANY identifier matches
  • • Returns the FIRST match found
  • • Does NOT rank or score matches

Example:

Input: company_domain = "example.com", company_linkedin = "linkedin.com/company/different-company"Result: Returns whichever company is found first in the database

Best Practice: Use the most specific identifier you have (domain is usually most reliable)

Error Scenarios

No identifiers provided:

{company_main_id: null, company_workspace_id: null, error: "No lookup fields provided"}

Company not found:

{company_main_id: null, company_workspace_id: null, error: null}

Note: This is a valid result (company doesn't exist), not an error

Database error:

{company_main_id: null, company_workspace_id: null, error: "Database query failed: [details]"}

Company found but not in workspace:

{company_main_id: "uuid", company_workspace_id: null, error: null}

This means: Company exists globally, but not connected to the specified workspace

Practical Examples

Request:

POST /company_lookup
{
  "company_domain": "anthropic.com",
  "workspace_id": "workspace-123"
}

Response - Company found and in workspace:

{
  "company_main_id": "f7e9a8b1-1234-5678-9abc-def123456789",
  "company_workspace_id": "a1b2c3d4-5678-9abc-def1-23456789abcd",
  "error": null
}

Response - Company found but NOT in workspace:

{
  "company_main_id": "f7e9a8b1-1234-5678-9abc-def123456789",
  "company_workspace_id": null,
  "error": null
}

Response - Company not found:

{
  "company_main_id": null,
  "company_workspace_id": null,
  "error": null
}

People

API endpoints for managing contact/people data including lookup, retrieval, creation, updates, and field deletion.

POST

/contact_lookup

Purpose: Search for an existing contact (person and/or lead) in the database using identifiers like email, LinkedIn, Xing, or name.

Logic Flow

1. Receive Request Data

Accepts: contact_email_valid, contact_email_catch_all, contact_email_invalid, contact_email_unsure, contact_linkedin, contact_xing, contact_first_name, contact_first_name_cleaned, contact_last_name, contact_last_name_cleaned, company_id, workspace_id

2. Clean the Data

Calls: clean_contact_fields() from processing_cleaning_funtions/contact_field_cleaning.py

What it does: Normalizes the input data (lowercase emails, standardizes URLs, trims whitespace)

3. Check Cache
  • • Generates a unique cache key based on the input parameters
  • • Checks if this lookup was done recently
  • • If found in cache: Returns the cached result immediately
  • • If not in cache: Continues to database lookup
4. Lookup Contact (Two-Phase Lookup)

Calls: lookup_contact() from processing_lookup_funtions/contact_lookup.py

Phase 1 - Lead Lookup (with company_id):

Searches for a lead (person working at a specific company)

Matches by:

  • • LinkedIn + company_id
  • • Xing + company_id
  • • Email addresses (any type)
  • • First name + Last name + company_id

Uses OR logic: Matches if ANY condition is true

If match found: Returns both lead_id and person_id, then STOPS (doesn't run Phase 2)

Phase 2 - Person Lookup (without company_id):

Only runs if Phase 1 found nothing

Searches for a person regardless of company

Matches by:

  • • LinkedIn (without company requirement)
  • • Xing (without company requirement)

If match found: Returns person_id only (lead_id is null)

Workspace Lookup:

If workspace_id is provided and person/lead is found:

  • • Checks if the person is connected to that workspace
  • • Checks if the lead is connected to that workspace

Returns: lead_id, person_id, lead_workspace_id, people_workspace_id

5. Cache the Result

Saves the result to cache for 1 hour. This includes "not found" results to avoid repeated database queries.

6. Return Response

Returns: {lead_id, person_id, lead_workspace_id, people_workspace_id, error}

  • lead_id: The unique ID of the lead (or null if not found)
  • person_id: The unique ID of the person (or null if not found)
  • lead_workspace_id: The ID of the lead's workspace connection (or null)
  • people_workspace_id: The ID of the person's workspace connection (or null)

Understanding Person vs Lead

  • Person: Represents an individual with basic information (name, demographics, career history)
  • Lead: Represents a person's connection to a specific company (their position, department, start date at that company)
  • One person can have multiple leads (if they worked at multiple companies)
  • The lookup prioritizes finding leads (person + company match) over just finding the person

Example:

  • • John Smith works at Company A as CEO (Lead 1)
  • • John Smith works at Company B as Advisor (Lead 2)
  • • Both leads link to the same Person record (John Smith)

Match Priority

The lookup follows a strict priority order:

  1. 1. Email (any type) - Always checked first in Phase 1
  2. 2. LinkedIn + company_id - Checked in Phase 1
  3. 3. Xing + company_id - Checked in Phase 1
  4. 4. Name + company_id - Checked in Phase 1
  5. 5. LinkedIn alone - Only checked in Phase 2 (if Phase 1 fails)
  6. 6. Xing alone - Only checked in Phase 2 (if Phase 1 fails)

Why this order?

  • • Emails are unique and most reliable
  • • Social profiles with company context are more specific than just names
  • • Phase 2 is a fallback for when we don't have company context

Returns first match: Like company lookup, returns the first match found

Error Scenarios

No identifiers provided:

{lead_id: null, person_id: null, lead_workspace_id: null, people_workspace_id: null, error: "No lookup fields available"}

Contact not found:

{lead_id: null, person_id: null, lead_workspace_id: null, people_workspace_id: null, error: null}

Note: This is a valid result (contact doesn't exist), not an error

Person found but not lead (Phase 2 success):

{lead_id: null, person_id: "uuid", lead_workspace_id: null, people_workspace_id: null, error: null}

This means: Person exists but we don't know their connection to the specified company

Lead found but not in workspace:

{lead_id: "uuid", person_id: "uuid", lead_workspace_id: null, people_workspace_id: "uuid", error: null}

This means: Lead exists, person is in workspace, but lead is not in workspace

Database error:

{lead_id: null, person_id: null, lead_workspace_id: null, people_workspace_id: null, error: "Database query failed: [details]"}

Practical Examples

Example 1: Full lead match with workspace

Request:

POST /contact_lookup
{
  "contact_email_valid": "john@anthropic.com",
  "company_id": "company-uuid-123",
  "workspace_id": "workspace-456"
}

Response:

{
  "lead_id": "lead-789",
  "person_id": "person-abc",
  "lead_workspace_id": "lead-ws-xyz",
  "people_workspace_id": "people-ws-def",
  "error": null
}

Interpretation: John works at this company (lead found), and both the lead and person are in the workspace.

Example 2: Person found but no lead (Phase 2 success)

Request:

POST /contact_lookup
{
  "contact_linkedin": "linkedin.com/in/johndoe",
  "company_id": "company-uuid-999"
}

Response:

{
  "lead_id": null,
  "person_id": "person-abc",
  "lead_workspace_id": null,
  "people_workspace_id": null,
  "error": null
}

Interpretation: John exists in database, but we don't have a record of him working at company 999. Phase 1 found nothing (no lead at that company), but Phase 2 found John by LinkedIn.

Example 3: Not found

Request:

POST /contact_lookup
{
  "contact_email_valid": "unknown@example.com"
}

Response:

{
  "lead_id": null,
  "person_id": null,
  "lead_workspace_id": null,
  "people_workspace_id": null,
  "error": null
}

Interpretation: This person doesn't exist in the database at all.

GET

/company_get

Purpose: Retrieve complete company data including all fields and identifiers. Requires query parameters: company_id=xxx&workspace_id=yyy

Logic Flow

1. Receive Request Parameters
  • Required: company_id
  • Optional: workspace_id
2. Get Main Company Data

Queries: db_companies_main table

Retrieves all company fields:

  • • company_name_cleaned, company_legal_form, b2b_b2c
  • • Address fields (street, city, zip, region, country)
  • • Registration numbers (tax, VAT, handels register)
  • • Employee counts, founded year, description
  • • Logo URL, LinkedIn size and followers
  • • Arrays: company_tags, company_sources
  • • Timestamps: created_at, updated_at
3. Get Company Identifiers

Queries: db_companies_dt_identifiers table

Retrieves all identifiers for this company

Organizes them by type into separate arrays:

  • • company_names: Array of all name variations
  • • company_domains: Array of all domains
  • • company_emails: Array of all email addresses
  • • company_phones: Array of all phone numbers
  • • company_linkedins: Array of all LinkedIn URLs
  • • company_instagrams: Array of all Instagram URLs
  • • company_facebooks: Array of all Facebook URLs
  • • company_xings: Array of all Xing URLs
  • • company_pinterests: Array of all Pinterest URLs
  • • company_tiktoks: Array of all TikTok URLs
  • • company_youtubes: Array of all YouTube URLs
  • • company_twitters: Array of all Twitter URLs
4. Get Workspace Data (if workspace_id provided)

Queries: db_companies_workspace table

Retrieves workspace-specific data:

  • • company_workspace_id: The ID of the workspace connection
  • • company_qualified: Qualification status in this workspace
  • • company_custom_tags_ws: Custom tags for this workspace
  • • Timestamps: workspace created_at, updated_at

If not provided or not found: Returns null values for workspace fields

5. Return Response

Returns: Complete company data dictionary with:

  • • All main table fields
  • • All identifiers organized in arrays by type
  • • Workspace data (if workspace_id was provided)
  • • Timestamps renamed to indicate source table (e.g., db_companies_main_created_at)

Error Handling in API File

The process_company_get() function uses try/except to handle errors:

Try Block:
  • • Validates that company_id is provided (if not: returns error "company_id is required")
  • • Gets database connection from pool
  • • Queries db_companies_main table for company data
  • • If company not found: Returns error "Company not found"
  • • Queries db_companies_dt_identifiers and organizes by type
  • • If workspace_id provided: Queries db_companies_workspace table
  • • Returns complete company data dictionary
Except Block:
  • • Catches any unexpected errors during database queries or data processing
  • • Returns error response with error message
  • • Examples: Database connection fails, Query execution errors, Data formatting errors

Note:

No cleaning functions are called (get endpoints don't need cleaning). Database connection is managed internally.

Why Use Company Get?

  • Company Lookup tells you IF a company exists and returns just the IDs
  • Company Get gives you ALL the information about a company once you know its ID
  • Use lookup first to find the company, then use get to retrieve all details

Error Scenarios

Company doesn't exist:

{"error": "Company not found", "company_id": "xxx"}

This happens when the company_id doesn't exist in the database

Company exists but not in workspace:

Returns: Complete company data with workspace fields as null

company_workspace_id: null, company_qualified: null, company_custom_tags_ws: null

Database error:

{"error": "Database error: [error details]", "company_id": "xxx"}

Empty identifier arrays:

Always returns arrays (even if empty): company_domains: []

This is normal - not all companies have all types of identifiers

Practical Example

Request:

GET /company_get?company_id=f7e9a8b1-1234-5678-9abc-def123456789&workspace_id=workspace-123

Response:

{
  "company_name_cleaned": "Anthropic",
  "company_legal_form": "Inc",
  "b2b_b2c": "B2B",
  "company_name_imprint": null,
  "company_street": "123 Main St",
  "company_street_nr": "123",
  "company_city": "San Francisco",
  "company_zip": "94105",
  "company_region": "California",
  "company_country": "USA",
  "company_tax_nr": null,
  "company_vat_nr": null,
  "company_handels_register": null,
  "company_employees_research": "100-500",
  "company_employees_linkedin": "250",
  "company_founded_year": "2021",
  "company_description": "AI safety and research company",
  "company_logo_url": "https://...",
  "company_size_linkedin": "201-500",
  "company_linkedin_followers": "50000",
  "company_tags": ["AI", "Research", "Safety"],
  "company_sources": ["LinkedIn", "Website"],
  "db_companies_main_created_at": "2024-01-15T10:30:00Z",
  "db_companies_main_updated_at": "2024-03-20T14:45:00Z",

  "company_names": ["Anthropic", "Anthropic PBC"],
  "company_domains": ["anthropic.com", "www.anthropic.com"],
  "company_emails": ["info@anthropic.com", "contact@anthropic.com"],
  "company_phones": ["+1-555-0100"],
  "company_linkedins": ["linkedin.com/company/anthropic"],
  "company_instagrams": [],
  "company_facebooks": [],
  "company_xings": [],
  "company_pinterests": [],
  "company_tiktoks": [],
  "company_youtubes": [],
  "company_twitters": ["twitter.com/anthropicai"],

  "company_workspace_id": "ws-abc-123",
  "company_qualified": "high",
  "company_custom_tags_ws": ["hot-lead", "enterprise"],
  "db_companies_workspace_created_at": "2024-02-01T09:00:00Z",
  "db_companies_workspace_updated_at": "2024-03-15T16:20:00Z"
}
GET

/contact_get

Purpose: Retrieve complete contact data including person information, lead information, and all identifiers. Requires query parameters: lead_id=xxx&workspace_id=yyy

Logic Flow

1. Receive Request Parameters
  • Required: lead_id
  • Optional: workspace_id
2. Get Lead Data

Queries: db_leads table

Retrieves all lead fields:

  • people_id: Reference to the person record
  • companies_main_id: Reference to the company
  • lead_position, lead_position_cleaned
  • lead_seniority, lead_departement
  • still_at_company: Boolean indicating if person still works there
  • lead_start_date, lead_end_date: Employment dates
  • lead_seniority_enum, lead_departement_enum: Standardized values
  • Position variations (plural forms)
  • lead_summary: Summary of the lead
  • lead_sources: Array of data sources
  • Timestamps: created_at, updated_at

Extracts people_id to retrieve person data

3. Get Person Data

Queries: db_people table using the people_id

Retrieves all person fields:

  • Name fields (first name, last name, cleaned versions)
  • person_gender, person_language
  • Birth information (year, date, estimated year)
  • Location (country, city, state)
  • LinkedIn CV and volunteering experience
  • Career dates (education start, first job start)
  • contact_location, contact_academic_title
  • person_native_german, person_scooling_country
  • LinkedIn profile image URL
  • LinkedIn followers and connections count
  • Timestamps: created_at, updated_at
4. Get People Identifiers

Queries: db_people_identifiers table

Retrieves social profile identifiers

Organizes them into arrays:

  • contact_linkedins: Array of all LinkedIn URLs
  • contact_xings: Array of all Xing URLs
5. Get Lead Identifiers

Queries: db_leads_identifiers table

Retrieves email identifiers with their validation status

Organizes them by status into separate arrays:

  • contact_emails_valid: Array of validated emails
  • contact_emails_invalid: Array of invalid emails
  • contact_emails_catch_all: Array of catch-all emails
  • contact_emails_wrong: Array of wrong emails
  • contact_emails_unsure: Array of emails with unsure status
6. Get Workspace Data (if workspace_id provided)

Queries: db_leads_workspace table

Retrieves workspace-specific lead data:

  • lead_workspace_id: The ID of the workspace connection
  • lead_qualified_ws: Qualification status in this workspace
  • Timestamps: workspace created_at, updated_at

If not provided or not found: Returns null values for workspace fields

7. Return Response

Returns: Complete contact data dictionary with:

  • All lead fields
  • All person fields
  • People identifiers in arrays
  • Lead identifiers organized by email status
  • Workspace data (if workspace_id was provided)
  • Timestamps renamed to indicate source table
Error Handling in API File

The process_contact_get() function in api_contact_get.py uses try/except to handle errors:

Try Block:

  • Validates that lead_id is provided (if not: returns error "lead_id is required")
  • Gets database connection from pool
  • Queries db_leads table for lead data
  • If lead not found: Returns error "Lead not found"
  • Extracts people_id from lead data
  • Queries db_people table using people_id
  • Queries db_people_identifiers and organizes by type
  • Queries db_leads_identifiers and organizes by email status
  • If workspace_id provided: Queries db_leads_workspace table
  • Returns complete contact data dictionary

Except Block:

  • Catches any unexpected errors during database queries or data processing
  • Returns error response with error message
  • Examples of caught errors:
    • Database connection fails
    • Query execution errors
    • Data formatting errors

Note: No cleaning functions are called (get endpoints don't need cleaning). Database connection is managed internally.

Why Use Contact Get?
  • Contact Lookup tells you IF a contact exists and returns just the IDs
  • Contact Get gives you ALL the information about a contact once you know the lead_id
  • Use lookup first to find the contact, then use get to retrieve all details
  • Contact Get requires lead_id (not person_id) because it's designed to get the full context of a person's role at a specific company
Why Lead ID and Not Person ID?

Contact Get requires a lead_id because:

  • It's designed to show a person in the context of a specific company
  • One person can work at multiple companies (multiple leads)
  • If you used person_id, the API wouldn't know which company context to show

Example:

  • John Smith (person) works at Company A as CEO (lead 1)
  • John Smith (same person) works at Company B as Advisor (lead 2)
  • Calling contact_get with lead 1's ID shows John's CEO role at Company A
  • Calling contact_get with lead 2's ID shows John's Advisor role at Company B
Error Handling

Scenario: Lead doesn't exist

{"error": "Lead not found", "lead_id": "xxx"}

This happens when the lead_id doesn't exist in the database

Scenario: Lead exists but person data missing

Complete lead data with null values for all person fields

This is rare but can happen if data integrity issues exist

Scenario: Lead exists but not in workspace

Complete contact data with workspace fields as null

Example: lead_workspace_id: null, lead_qualified_ws: null

Scenario: Database error

{"error": "Database error: [error details]", "lead_id": "xxx"}

Scenario: Empty identifier arrays

contact_linkedins: []

Always returns arrays (even if empty) - This is normal; not all contacts have all types of identifiers

Practical Example

Request:

GET /contact_get?lead_id=lead-789-xyz&workspace_id=workspace-456

Response:

{
  "people_id": "person-abc-123",
  "companies_main_id": "company-xyz-789",
  "lead_position": "Chief Executive Officer",
  "lead_position_cleaned": "CEO",
  "lead_seniority": "C-Level",
  "lead_departement": "Executive",
  "still_at_company": true,
  "lead_start_date": "2020-01-01",
  "lead_end_date": null,
  "lead_seniority_enum": "c_level",
  "lead_departement_enum": "executive",
  "lead_position_clean_plural_dativ": "CEOs",
  "lead_position_clean_plural_nominativ": "CEOs",
  "lead_summary": "Experienced executive in AI industry",
  "lead_sources": ["LinkedIn", "Company Website"],
  "db_leads_created_at": "2024-01-10T11:00:00Z",
  "db_leads_updated_at": "2024-03-18T15:30:00Z",

  "person_first_name": "John",
  "contact_first_name_cleaned": "John",
  "person_last_name": "Smith",
  "contact_last_name_cleaned": "Smith",
  "person_gender": "male",
  "person_language": "English",
  "contact_estimated_birth_year": "1980",
  "contact_birth_year": "1980",
  "contact_birth_date": "1980-05-15",
  "person_country": "USA",
  "person_city": "San Francisco",
  "linkedin_cv": "Detailed career history...",
  "linkedin_volunteerings": "Board member at...",
  "started_education_linkedin": "1998",
  "first_job_start_linkedin": "2002",
  "contact_location": "San Francisco, CA",
  "contact_academic_title": "PhD",
  "person_state": "California",
  "person_native_german": false,
  "person_scooling_country": "USA",
  "contact_linkedin_image_url": "https://...",
  "person_linkedin_followers": "5000",
  "person_linkedin_connections": "500+",
  "db_people_created_at": "2024-01-05T09:00:00Z",
  "db_people_updated_at": "2024-03-10T14:00:00Z",

  "contact_linkedins": [
    "linkedin.com/in/johnsmith",
    "linkedin.com/in/john-smith-ceo"
  ],
  "contact_xings": [],

  "contact_emails_valid": [
    "john@company.com",
    "john.smith@company.com"
  ],
  "contact_emails_invalid": [],
  "contact_emails_catch_all": [
    "info@company.com"
  ],
  "contact_emails_wrong": [],
  "contact_emails_unsure": [
    "j.smith@company.com"
  ],

  "lead_workspace_id": "lead-ws-xyz-123",
  "lead_qualified_ws": "qualified",
  "db_leads_workspace_created_at": "2024-02-05T10:00:00Z",
  "db_leads_workspace_updated_at": "2024-03-12T16:45:00Z"
}
POST

/company_push

Purpose: Look up company and automatically create it if not found. Also creates workspace connection if workspace_id is provided and connection doesn't exist.

Logic Flow

1. Receive Request Data

Accepts: All company fields, workspace_id

2. Set Connection Pools

Sets connection pool for: company_lookup, company_push, company_workspace_push

3. Clean the Data

Calls: clean_company_fields() from processing_cleaning_funtions/company_field_cleaning.py

What it does: Normalizes company data

4. Lookup Company

Calls: lookup_company() from processing_lookup_funtions/company_lookup.py

NO CACHING - Always fresh lookup to determine correct status

Returns: company_main_id and company_workspace_id (if found)

5. Determine Company Status

If company NOT found:

  • Calls: push_company() from processing_push_funtions/company_push.py
  • What it does: Creates new company record in db_companies_main and identifiers in db_companies_dt_identifiers
  • Sets status_company = "created"
  • Updates company_main_id with newly created ID

If company found:

  • Sets status_company = "found"
  • No push operation needed
6. Determine Workspace Status (if workspace_id provided and company exists)

If workspace connection NOT found:

  • Calls: push_company_workspace() from processing_push_funtions/company_workspace_push.py
  • What it does: Creates workspace connection in db_companies_workspace
  • Sets status_company_workspace = "created"
  • Updates company_workspace_id with newly created ID

If workspace connection found:

  • Sets status_company_workspace = "found"
  • No push operation needed
7. Return Response

Returns: {company_main_id, company_workspace_id, status_company, status_company_workspace, error}

  • status_company: "found" or "created"
  • status_company_workspace: "found" or "created" (or null if no workspace_id provided)
Error Handling in API File

The process_company_push() function handles errors at multiple stages:

Validation:

  • If no data provided: Returns error "No data provided"

Lookup Error:

  • If lookup_company() returns error: Returns error from lookup function

Push Errors:

  • If company_push() fails: Returns error "company_push failed" with details
  • If company_workspace_push() fails: Returns company_main_id and status_company, but error for workspace operation

Called Functions (Brief Description):

  • clean_company_fields(): Normalizes company data
  • lookup_company(): Searches database for company
  • push_company(): Creates new company record with identifiers, returns company_id
  • push_company_workspace(): Creates workspace connection, returns workspace connection id
POST

/contact_push

Purpose: Look up contact and automatically create person, lead, and workspace connections if not found.

Logic Flow

1. Receive Request Data

Accepts: All contact fields, company_id, workspace_id

2. Set Connection Pools

Sets connection pool for: contact_lookup, people_push, lead_push, people_workspace_push, lead_workspace_push

3. Extract IDs

Extracts company_id and workspace_id from data

4. Clean the Data

Calls: clean_contact_fields() from processing_cleaning_funtions/contact_field_cleaning.py

What it does: Normalizes contact data

5. Lookup Contact

Calls: lookup_contact() from processing_lookup_funtions/contact_lookup.py

NO CACHING - Always fresh lookup

Returns: lead_id, person_id, lead_workspace_id, people_workspace_id

6. Determine Person Status

If person NOT found:

  • Calls: create_person_record() from processing_push_funtions/people_push.py
  • What it does: Creates new person record in db_people and identifiers in db_people_identifiers
  • Sets status_person = "created"

If person found:

  • Sets status_person = "found"
7. Determine Lead Status (if person_id and company_id exist)

If lead NOT found:

  • Calls: create_lead_record() from processing_push_funtions/lead_push.py
  • What it does: Creates new lead record in db_leads connecting person to company, creates email identifiers in db_leads_identifiers
  • Sets status_lead = "created"

If lead found:

  • Sets status_lead = "found"
8. Determine People Workspace Status (if workspace_id and person_id exist)

If people workspace NOT found:

  • Calls: create_people_workspace_connection() from processing_push_funtions/people_workspace_push.py
  • What it does: Creates workspace connection in db_people_workspace
  • Sets status_people_workspace = "created"

If people workspace found:

  • Sets status_people_workspace = "found"
9. Determine Lead Workspace Status (if workspace_id and lead_id exist)

If lead workspace NOT found:

  • Calls: create_lead_workspace_connection() from processing_push_funtions/lead_workspace_push.py
  • What it does: Creates workspace connection in db_leads_workspace
  • Sets status_lead_workspace = "created"

If lead workspace found:

  • Sets status_lead_workspace = "found"
10. Return Response

Returns: {person_id, lead_id, people_workspace_id, lead_workspace_id, status_person, status_lead, status_people_workspace, status_lead_workspace, error}

All status values: "found" or "created"

Error Handling in API File

The process_contact_push() function handles errors at multiple stages:

Validation:

  • If no data provided: Returns error "No data provided"

Lookup Error:

  • If lookup_contact() returns error: Returns all null IDs with error

Push Errors (cascading returns):

  • If people_push fails: Returns error immediately
  • If lead_push fails: Returns person_id and status_person, but error for lead
  • If people_workspace_push fails: Returns person_id, lead_id, and statuses, but error for people workspace
  • If lead_workspace_push fails: Returns all IDs and statuses except lead workspace

Called Functions (Brief Description):

  • clean_contact_fields(): Normalizes contact data
  • lookup_contact(): Searches database for contact using two-phase lookup
  • create_person_record(): Creates person in db_people with identifiers
  • create_lead_record(): Creates lead in db_leads with email identifiers
  • create_people_workspace_connection(): Creates workspace link for person
  • create_lead_workspace_connection(): Creates workspace link for lead
POST

/company_push_patch

Purpose: Look up company and CREATE if not found OR UPDATE if found. Same for workspace connections.

Logic Flow

1. Receive Request Data

Accepts: All company fields, workspace_id

2. Set Connection Pools

Sets connection pool for: company_lookup, company_push, company_workspace_push, company_update, company_workspace_update

3. Clean the Data

Calls: clean_company_fields()

What it does: Normalizes company data

4. Lookup Company

Calls: lookup_company()

NO CACHING - Always fresh lookup

5. Determine Company Status and Action

If company NOT found:

  • Calls: push_company() from processing_push_funtions/company_push.py
  • What it does: Creates new company record
  • Sets status_company = "created"

If company found:

  • Calls: update_company_identifiers() from processing_update_funtions/company_update.py
  • What it does: Updates existing company fields and adds new identifiers (non-destructive)
  • Sets status_company = "updated"
6. Determine Workspace Status and Action (if workspace_id and company_main_id exist)

If workspace connection NOT found:

  • Calls: push_company_workspace() from processing_push_funtions/company_workspace_push.py
  • What it does: Creates workspace connection
  • Sets status_company_workspace = "created"

If workspace connection found:

  • Calls: update_company_workspace() from processing_update_funtions/company_workspace_update.py
  • What it does: Updates workspace connection fields (non-destructive)
  • Sets status_company_workspace = "updated"
7. Return Response

Returns: {company_main_id, company_workspace_id, status_company, status_company_workspace, error}

  • status_company: "created" or "updated"
  • status_company_workspace: "created" or "updated"
Error Handling in API File

The process_company_push_patch() function handles errors at multiple stages:

Validation:

  • If no data provided: Returns error "No data provided"

Lookup Error:

  • If lookup_company() returns error: Returns error from lookup

Push/Update Errors:

  • If company_push() fails: Returns error "company_push failed"
  • If company_update() fails: Returns company_main_id but error for update
  • If company_workspace_push() fails: Returns company info but error for workspace
  • If company_workspace_update() fails: Returns company info but error for workspace update

Called Functions (Brief Description):

  • clean_company_fields(): Normalizes company data
  • lookup_company(): Searches database for company
  • push_company(): Creates new company with identifiers
  • update_company_identifiers(): Updates company fields and adds new identifiers without removing existing data
  • push_company_workspace(): Creates workspace connection
  • update_company_workspace(): Updates workspace fields without removing existing data
POST

/contact_push_patch

Purpose: Look up contact and CREATE if not found OR UPDATE if found for person, lead, and workspace connections.

Logic Flow

1. Receive Request Data

Accepts: All contact fields, company_id, workspace_id

2. Set Connection Pools

Sets connection pool for: contact_lookup, people_push, lead_push, people_workspace_push, lead_workspace_push, people_update, lead_update, people_workspace_update, lead_workspace_update

3. Extract IDs and Clean Data

Extracts company_id and workspace_id

Calls: clean_contact_fields()

4. Lookup Contact

Calls: lookup_contact()

NO CACHING

5. Determine Person Status and Action

If person NOT found:

  • Calls: create_person_record() from processing_push_funtions/people_push.py
  • Sets status_person = "created"

If person found:

  • Calls: update_person_identifiers() from processing_update_funtions/people_update.py
  • What it does: Updates person fields and adds new identifiers (non-destructive)
  • Sets status_person = "updated"
6. Determine Lead Status and Action (if person_id and company_id exist)

If lead NOT found:

  • Calls: create_lead_record() from processing_push_funtions/lead_push.py
  • Sets status_lead = "created"

If lead found:

  • Calls: update_lead_identifiers() from processing_update_funtions/lead_update.py
  • What it does: Updates lead fields and adds new email identifiers (non-destructive)
  • Sets status_lead = "updated"
7. Determine People Workspace Status and Action (if workspace_id and person_id exist)

If people workspace NOT found:

  • Calls: create_people_workspace_connection() from processing_push_funtions/people_workspace_push.py
  • Sets status_people_workspace = "created"

If people workspace found:

  • Calls: update_people_workspace() from processing_update_funtions/people_workspace_update.py
  • What it does: Updates workspace fields (non-destructive)
  • Sets status_people_workspace = "updated"
8. Determine Lead Workspace Status and Action (if workspace_id and lead_id exist)

If lead workspace NOT found:

  • Calls: create_lead_workspace_connection() from processing_push_funtions/lead_workspace_push.py
  • Sets status_lead_workspace = "created"

If lead workspace found:

  • Calls: update_lead_workspace() from processing_update_funtions/lead_workspace_update.py
  • What it does: Updates workspace qualification status
  • Sets status_lead_workspace = "updated"
9. Return Response

Returns: {person_id, lead_id, people_workspace_id, lead_workspace_id, status_person, status_lead, status_people_workspace, status_lead_workspace, error}

All status values: "created" or "updated"

Error Handling in API File

The process_contact_push_patch() function handles errors at multiple stages:

Validation:

  • If no data provided: Returns error "No data provided"

Lookup Error:

  • If lookup_contact() returns error: Returns all null IDs with error

Push/Update Errors (cascading returns):

  • If people_push/update fails: Returns error immediately
  • If lead_push/update fails: Returns person info but error for lead
  • If people_workspace_push/update fails: Returns person and lead info but error for people workspace
  • If lead_workspace_push/update fails: Returns all info except lead workspace

Called Functions (Brief Description):

  • clean_contact_fields(): Normalizes contact data
  • lookup_contact(): Searches database for contact
  • create_person_record(): Creates person with identifiers
  • update_person_identifiers(): Updates person fields and adds new identifiers
  • create_lead_record(): Creates lead with email identifiers
  • update_lead_identifiers(): Updates lead fields and adds new email identifiers
  • create_people_workspace_connection(): Creates workspace link for person
  • update_people_workspace(): Updates people workspace fields
  • create_lead_workspace_connection(): Creates workspace link for lead
  • update_lead_workspace(): Updates lead workspace qualification status
POST

/company_delete_fields

Purpose: Delete specific fields from company records in db_companies_main, db_companies_dt_identifiers, and db_companies_workspace.

Logic Flow

1. Receive Request Data
  • Required: company_id
  • Optional: Fields to delete (each field as boolean true or with value)
2. Validate Data
  • Checks that company_id is provided
  • Returns error if company_id missing
3. Get Database Connection
  • Gets connection from pool
  • Creates cursor for queries
4. Part 1: Handle Boolean Fields
  • For fields marked as true in request
  • Sets fields to NULL in db_companies_main
  • Fields: company_name_cleaned, legal_form, address fields, registration numbers, employee counts, etc.
  • Executes: UPDATE db_companies_main SET field = NULL WHERE id = company_id
5. Part 2: Handle Array Fields
  • For array fields with values (company_tags, company_sources)
  • Parses comma-separated values
  • Removes each item from array
  • Executes: UPDATE db_companies_main SET field = array_remove(field, item) WHERE id = company_id
6. Part 3: Handle Identifier Fields
  • For identifier fields with values (name, domain, linkedin, emails, phones, social media)
  • Deletes matching records from db_companies_dt_identifiers
  • Executes: DELETE FROM db_companies_dt_identifiers WHERE companies_main_id = company_id AND identifier = value AND type = type
7. Part 4: Handle Workspace Fields (if company_workspace_id provided)
  • For company_qualified field: Sets to NULL
  • For company_custom_tags_ws: Removes items from array
  • Executes on db_companies_workspace table
8. Commit and Return
  • Commits all changes
  • Returns success with list of operations performed
Error Handling in API File

The process_company_delete_fields() function uses try/except with rollback:

Try Block:

  • Validates company_id is provided
  • Gets database connection from pool
  • Executes multiple UPDATE and DELETE queries
  • Tracks operations performed
  • Commits transaction

Except Block - Database Errors:

  • Catches psycopg2.Error (database-specific errors)
  • Rolls back transaction if error occurs
  • Closes cursor if open
  • Releases connection back to pool
  • Returns: (False, "Database error: [error details]")

Except Block - General Errors:

  • Catches any other Exception
  • Rolls back transaction if error occurs
  • Closes cursor if open
  • Releases connection back to pool
  • Returns: (False, "Error: [error details]")

Finally Logic (implicit in except blocks):

  • Always attempts to close cursor
  • Always attempts to rollback on error
  • Always releases connection back to pool

No Called Processing Functions: This file directly executes database queries without calling separate processing modules.

POST

/contact_delete_fields

Purpose: Delete specific fields from contact records in db_people, db_leads, db_people_identifiers, db_leads_identifiers, and workspace tables.

Logic Flow

1. Receive Request Data
  • Required: At least one of people_id or lead_id
  • Optional: people_workspace_id, leads_workspace_id, fields to delete
2. Validate Data
  • Checks that at least people_id OR lead_id is provided
  • Returns error if both missing
3. Get Database Connection
  • Gets connection from pool
  • Creates cursor for queries
4. Part 1: Handle People Boolean Fields (if people_id provided)
  • For fields marked as true in request
  • Sets fields to NULL in db_people
  • Fields: name, gender, language, birth info, location, LinkedIn data, etc.
  • Executes: UPDATE db_people SET field = NULL WHERE id = people_id
5. Part 2: Handle Lead Boolean Fields (if lead_id provided)
  • For fields marked as true in request
  • Sets fields to NULL in db_leads
  • Fields: position, seniority, department, dates, summary, etc.
  • Executes: UPDATE db_leads SET field = NULL WHERE id = lead_id
6. Part 3: Handle Lead Array Fields (if lead_id provided)
  • For lead_sources array field
  • Parses comma-separated values
  • Removes each item from array
  • Executes: UPDATE db_leads SET lead_sources = array_remove(lead_sources, item) WHERE id = lead_id
7. Part 4: Handle People Identifiers (if people_id provided)
  • For identifier fields with values (contact_linkedin, contact_xing)
  • Deletes matching records from db_people_identifiers
  • Executes: DELETE FROM db_people_identifiers WHERE people_id = people_id AND contact_ident_identifier = value AND contact_ident_type = type
8. Part 5: Handle Lead Identifiers (if lead_id provided)
  • For email identifier fields with values (valid, catch_all, invalid, unsure)
  • Deletes matching records from db_leads_identifiers
  • Executes: DELETE FROM db_leads_identifiers WHERE leads_id = lead_id AND lead_ident_identifier = email AND lead_ident_type = 'email' AND lead_ident_status = status
9. Part 6: Handle People Workspace (if people_workspace_id provided)
  • Note: db_people_workspace only contains IDs, no other fields to delete
  • Logs operation performed
10. Part 7: Handle Leads Workspace (if leads_workspace_id provided)
  • For lead_qualified_ws field: Sets to NULL
  • Executes: UPDATE db_leads_workspace SET lead_qualified_ws = NULL WHERE id = leads_workspace_id
11. Commit and Return
  • Commits all changes
  • Returns success with list of operations performed
Error Handling in API File

The process_contact_delete_fields() function uses try/except with rollback:

Try Block:

  • Validates at least one ID is provided
  • Gets database connection from pool
  • Executes multiple UPDATE and DELETE queries across multiple tables
  • Tracks operations performed
  • Commits transaction

Except Block - Database Errors:

  • Catches psycopg2.Error (database-specific errors)
  • Rolls back transaction if error occurs
  • Closes cursor if open
  • Releases connection back to pool
  • Returns: (False, "Database error: [error details]")

Except Block - General Errors:

  • Catches any other Exception
  • Rolls back transaction if error occurs
  • Closes cursor if open
  • Releases connection back to pool
  • Returns: (False, "Error: [error details]")

Finally Logic (implicit in except blocks):

  • Always attempts to close cursor
  • Always attempts to rollback on error
  • Always releases connection back to pool

No Called Processing Functions: This file directly executes database queries without calling separate processing modules.

Key Differences Between Endpoints

Push vs Push/Patch

Push Endpoints
  • Found: Returns "found" status, NO update
  • Not Found: Creates new record, returns "created" status
Push/Patch Endpoints
  • Found: Updates existing record, returns "updated" status
  • Not Found: Creates new record, returns "created" status

Update Behavior (Push/Patch only)

All update functions are non-destructive:

  • Existing fields: Keep current values
  • New fields: Add new values
  • Arrays: Append new items (don't remove existing)
  • Identifiers: Add new identifiers (don't remove existing)

Delete Behavior

Delete endpoints are selective:

  • Boolean fields with value true: Set to NULL
  • Array fields with values: Remove only specified items
  • Identifier fields with values: Delete only specified identifiers
  • Uses explicit DELETE queries for identifier tables

Error Handling Patterns

Push/Push-Patch Files

Pattern: Cascading returns with partial success

  • If early operation fails: Return error immediately
  • If later operation fails: Return successful IDs but error for failed operation

Example:

If person created but lead fails, return person_id and status_person, but error for lead

Delete Files

Pattern: All-or-nothing transaction with rollback

  • If any operation fails: Rollback entire transaction
  • Return success only if all operations commit
  • Uses explicit transaction management

Connection Management

All Files:

  • Get connection from shared pool
  • Always release connection back to pool (even on error)
  • Use try/except/finally pattern for cleanup

Summary: Lookup vs Get

Lookup Endpoints (Search)

  • Purpose: Find if something exists
  • Input: Identifiers (name, domain, email, etc.)
  • Output: IDs only
  • Use Case: "Does this company/contact exist in our database?"
  • Caching: Yes (1 hour)
  • Returns null when not found: This is normal, not an error

Get Endpoints (Retrieve)

  • Purpose: Get complete information
  • Input: ID (company_id or lead_id)
  • Output: All data fields, all identifiers, workspace connections
  • Use Case: "Give me everything you know about this company/contact"
  • Caching: No
  • Returns error when not found: ID doesn't exist in database

Typical Workflows

Workflow 1: Check if company exists, then get details
1. POST /company_lookup with domain="anthropic.com"
2. Response: {company_main_id: "uuid-123", ...}
3. GET /company_get?company_id=uuid-123
4. Response: {complete company data...}
Workflow 2: Search for contact, then get full info
1. POST /contact_lookup with email="john@company.com" and company_id="company-uuid"
2. Response: {lead_id: "lead-456", person_id: "person-789", ...}
3. GET /contact_get?lead_id=lead-456
4. Response: {complete contact data including lead and person info...}
Workflow 3: Check workspace membership
1. POST /company_lookup with domain="example.com" and workspace_id="ws-123"
2. Response: {company_main_id: "uuid-123", company_workspace_id: null, ...}
3. Interpretation: Company exists but is NOT in workspace ws-123
4. Next step: Use company_push to add it to the workspace

Common Questions

Q: When should I use workspace_id?

Use workspace_id in lookups when:

  • You want to check if a company/contact is already in a specific workspace
  • You're building workspace-specific features (like workspace dashboards)

Use workspace_id in get calls when:

  • You want workspace-specific data (qualification status, custom tags)
  • You're displaying data in a workspace context

Don't use workspace_id when:

  • You're doing a global search across all workspaces
  • You want to find all instances of a company/contact regardless of workspace

Q: Why does contact lookup return person_id without lead_id sometimes?

This happens in Phase 2 of contact lookup:

  • You searched for a person (by LinkedIn/Xing) without providing company_id, OR
  • You provided company_id but the person doesn't work at that company
  • The system found the person in the database but couldn't find a lead (company connection)

This means: "We know this person exists, but we don't have employment data for the company you specified"

Q: What's the difference between company_name and company_name_cleaned?

  • company_name: Stored in db_companies_dt_identifiers - can have multiple variations (old names, alternate spellings)
  • company_name_cleaned: Stored in db_companies_main - the current, standardized company name

Example:

  • company_names array: ["Facebook", "Facebook Inc", "Meta"]
  • company_name_cleaned: "Meta Platforms Inc"

Q: Why are identifiers in separate arrays by type?

For historical tracking:

  • Company domains change (rebrandings, mergers)
  • People change names (marriage, legal name changes)
  • Email addresses change (job changes)

For matching flexibility:

  • Search with old domain, find company by current domain
  • Search with old email, find person by current email
  • Match against ANY stored identifier

Q: Can I search for a person without knowing their company?

Yes, in contact_lookup:

  • Don't provide company_id
  • Provide LinkedIn or Xing URL
  • The lookup will run Phase 2 (person-only search)
  • You'll get person_id but not lead_id

This means: "Found the person, but don't know where they work"

Q: What happens if I provide multiple identifiers that match different companies?

The system returns the first match found:

  • Example: domain="company-a.com", linkedin="linkedin.com/company/company-b"
  • If company A is found first in database, it returns company A
  • No ranking or scoring is performed

Best practice: Use the most specific/reliable identifier you have

Database Tables Reference

Company Tables

v_company_lookup

Optimized view for searching companies (contains arrays of identifiers)

db_companies_main

Main company data (single record per company)

db_companies_dt_identifiers

Company identifiers (multiple records per company, one per identifier)

db_companies_workspace

Workspace connections (one record per company-workspace pair)

Contact Tables

v_contact_lookup

Optimized view for searching contacts (contains arrays of identifiers)

db_leads

Lead data - person at company (one record per person-company relationship)

db_people

Person data - individual information (single record per person)

db_people_identifiers

Person identifiers like LinkedIn/Xing (multiple records per person)

db_leads_identifiers

Lead identifiers like emails (multiple records per lead)

db_leads_workspace

Lead workspace connections (one record per lead-workspace pair)

db_people_workspace

Person workspace connections (one record per person-workspace pair)

Table Relationships

Company Structure:
db_companies_main (1) ←→ (many) db_companies_dt_identifiers
                  (1) ←→ (many) db_companies_workspace
Contact Structure:
db_people (1) ←→ (many) db_people_identifiers
          (1) ←→ (many) db_people_workspace
          (1) ←→ (many) db_leads

db_leads (1) ←→ (many) db_leads_identifiers
         (1) ←→ (many) db_leads_workspace
         (many) ←→ (1) db_companies_main
Key Points:
  • One person can have many leads (worked at multiple companies)
  • One lead belongs to one person and one company
  • Identifiers are stored separately for historical tracking
  • Workspace connections are separate for each entity

Functions

Processing functions used across API endpoints for data cleaning, validation, and normalization.

This document explains the logic flow for all processing functions organized by functional area. Each section describes what each function does, its complete logic flow, database operations, error handling, and return values.

1. Processing Cleaning Functions

1.1 Contact Field Cleaning (contact_field_cleaning.py)

Purpose:

Normalizes and validates contact/lead field data before database operations.

normalize_quotes(text: str) → str

Purpose:

Converts all Unicode quote characters to standard ASCII quotes.

Logic Flow:

  1. 1. Check if input is string type, return as-is if not
  2. 2. Replace all smart quotes and quote-like characters:
    • • Right single quotation mark (U+2019) → apostrophe
    • • Left single quotation mark (U+2018) → apostrophe
    • • Grave accent/backtick (U+0060) → apostrophe
    • • Acute accent (U+00B4) → apostrophe
    • • Left double quotation mark (U+201C) → double quote
    • • Right double quotation mark (U+201D) → double quote
  3. 3. Return normalized text

Database Operations:

None

Return Value:

Normalized string with standard ASCII quotes

clean_contact_fields(contact_data: dict) → dict

Purpose:

Validates and normalizes all contact fields including social media URLs and removes empty values.

Logic Flow:

  1. 1. Normalize Quotes in All String Fields - Iterate through all fields and call normalize_quotes() for each string value
  2. 2. Clean LinkedIn URL - Extract person slug, validate format, format as https://www.linkedin.com/in/{slug}
  3. 3. Clean Xing URL - Extract person slug, validate format, format as https://www.xing.com/people/{slug}
  4. 4. Remove Empty Fields - Delete all fields with None, empty string, or blank values

Example:

Input:{"contact_linkedin": "https://linkedin.com/in/john-doe?param=123", "contact_email_valid": "john@example.com", "contact_xing": ""}
Output:{"contact_linkedin": "https://www.linkedin.com/in/john-doe", "contact_email_valid": "john@example.com"}

1.2 Company Field Cleaning (company_field_cleaning.py)

Purpose:

Normalizes and validates company field data including domains and social media URLs.

clean_company_fields(company_data: dict) → dict

Purpose:

Validates and normalizes all company fields including domains and social media URLs.

Logic Flow:

  1. 1. Normalize and Strip All String Fields
  2. 2. Clean LinkedIn URL - Extract company slug, format as https://www.linkedin.com/company/{slug}
  3. 3. Clean Domain - Remove prefixes (https://, www.), store only base domain
  4. 4. Clean Xing URL - Format as https://www.xing.com/pages/{slug}
  5. 5. Clean Instagram URL - Format as https://www.instagram.com/{slug}
  6. 6. Clean Facebook URL - Format as https://www.facebook.com/{slug}
  7. 7. Clean Pinterest URL - Format as https://de.pinterest.com/{slug}
  8. 8. Clean TikTok URL - Format as https://www.tiktok.com/{slug}
  9. 9. Clean YouTube URL - Format as https://www.youtube.com/{slug}
  10. 10. Clean Twitter URL - Format as https://x.com/{slug}
  11. 11. Remove Empty Fields

Note: Try/except blocks around URL parsing for each platform. Malformed URLs are set to None.

Field Cleaning Detailed Documentation: For comprehensive field cleaning algorithms and platform-specific rules, see the Field Cleaning section earlier in this documentation.

2. Processing CSV Functions

2.1 Fetch CSV Fields (fetch_csv_fields.py)

Purpose:

Fetches and parses specific rows from CSV files stored in Supabase Storage with header mapping.

Connection Management:

  • _shared_pool: Global variable set by main script
  • set_connection_pool(pool): Sets the shared pool
  • get_pg_connection(): Gets connection from pool
  • release_pg_connection(conn): Returns connection to pool
fetch_csv_batch(file_name, start_row, end_row, csv_header_rows)

Logic Flow:

  1. 1. Construct File Path - Build path: csv_uploads/{file_name}.csv
  2. 2. Check File Exists - Query storage.objects table
  3. 3. Download CSV File - HTTP GET from Supabase Storage
  4. 4. Parse CSV Content - Create CSV reader from StringIO
  5. 5. Validate Row Indices - Check bounds and adjust end_row
  6. 6. Extract Batch Rows - Map column values to header names if mapping provided
  7. 7. Return Batch Result - Include file_name, total_rows, range, and row data

Database Operations:

  • Table: storage.objects
  • Query: SELECT to check file existence
  • Connection: Uses shared pool, released after query

Error Scenarios:

  • • Database Error: Returns error message, releases connection
  • • File Not Found: Returns error "File not found in storage"
  • • HTTP Error: Returns error with status code
  • • Invalid Row Index: Returns error with valid range

2.2 Fetch Upload Job (fetch_upload_job.py)

Purpose:

Retrieves upload job metadata from ev_upload_jobs table with fallback to HTTP API.

Heroku Compatibility: Supports temporary connections before pool initialization. Falls back to HTTP API if PostgreSQL fails.

fetch_upload_job_by_id(job_id: str)

Fetches upload job data using PostgreSQL first, HTTP API as fallback.

Logic Flow:

  1. 1. Try PostgreSQL fetch via _fetch_via_postgres()
  2. 2. If fails, try HTTP fetch via _fetch_via_http()
  3. 3. Return result or None if both methods fail
_fetch_via_postgres(job_id: str)

Logic Flow:

  1. 1. Get connection from pool with RealDictCursor
  2. 2. Query ev_upload_jobs by ID (id, workspace_id, user_id, data_type, job_type, mapped_csv_fields, csv_header_rows, etc.)
  3. 3. Filter csv_header_rows to keep only the row at csv_header_row_position
  4. 4. Check CSV file existence via _check_csv_exists()
  5. 5. Reorder result dictionary (core fields first, metadata last)
  6. 6. Release connection and return ordered result

Database Operations:

Table: ev_upload_jobs, storage.objects | Query: SELECT by ID | Connection: Uses shared pool

2.3 Update Progress (update_progress.py)

Purpose:

Updates job status and progress data in ev_upload_jobs table.

update_job_status(job_id, job_status)

Updates the job_status field in ev_upload_jobs.

Query:

UPDATE ev_upload_jobs SET job_status = %s WHERE id = %s

Return: Tuple (success: bool, error_message: str or None)

update_job_progress(job_id, progress_data)

Updates the progress_data JSONB field in ev_upload_jobs.

Query:

UPDATE ev_upload_jobs SET progress_data = %s WHERE id = %s

Return: Tuple (success: bool, error_message: str or None)

2.4 Error CSV Push (error_csv_push.py)

Purpose:

Formats and uploads error CSV files to Supabase Storage csv_wrong folder.

push_error_csv(all_csv_errors, csv_file_name)

Main function to format, check, delete old, and upload error CSV.

Logic Flow:

  1. 1. Initialize Result Dictionary
  2. 2. Format Errors to CSV - Call format_errors_to_csv() with row_index, errors as first columns
  3. 3. Check if File Exists - Query storage.objects
  4. 4. Delete Existing File - If found, delete via _delete_error_csv()
  5. 5. Upload New Error CSV - POST to Supabase Storage
  6. 6. Return Result - Complete status of all operations

Return Value:

{error_file_found: bool, error_file_deleted: str|None, file_created: bool, file_name: str, message: str}

3. Processing Lookup Functions

3.1 Contact Lookup (contact_lookup.py)

Purpose:

Searches for existing contacts/leads in v_contact_lookup view with two-phase lookup strategy.

lookup_contact(contact_data, company_id, workspace_id)

Two-phase lookup for contacts - first with company context, then without.

Phase 1: Lead Lookup (with company context)

  • Email - Check all email types (valid, catch_all, invalid, unsure) against email_array
  • LinkedIn + company_id - Check linkedin_array with lead_companies_main_id match
  • Xing + company_id - Check xing_array with lead_companies_main_id match
  • Name + company_id - Check first_name/last_name variations with company match

Phase 2: Person Lookup (fallback without company)

  • LinkedIn alone - Check linkedin_array without company constraint
  • Xing alone - Check xing_array without company constraint

Match Strategy: Phase 1 prioritizes lead matches (person at specific company). Phase 2 falls back to person matches (any company). Email has highest priority across all companies.

Return Value:

Tuple (lead_id, person_id, lead_workspace_id, people_workspace_id, lookup_result, error_message)

• Phase 1 success: Returns lead_id and person_id

• Phase 2 success: Returns person_id only (lead_id is None)

• Not found: Returns all None (not an error)

• Error: Returns all None with error message

Database Operations:

  • View: v_contact_lookup (main search)
  • Tables: db_leads_workspace, db_people_workspace (workspace lookups)
  • Queries: SELECT with parameterized OR conditions
  • Connection: Uses shared pool, always released

3.2 Company Lookup (company_lookup.py)

Purpose:

Searches for existing companies in v_company_lookup view.

lookup_company(company_name, company_domain, company_linkedin, workspace_id)

Single-phase lookup for companies using ANY identifier match.

Lookup Conditions (OR logic):

  • Name: company_name_array match
  • Domain: company_domain_array match
  • LinkedIn: company_linkedin_array match

Match Strategy: Uses OR logic (any identifier matches). Returns first match (LIMIT 1). No ranking or priority between identifiers.

Return Value:

Tuple (company_main_id, company_workspace_id, error_message)

• Found: Returns company_main_id (and company_workspace_id if workspace_id provided)

• Not found: Returns (None, None, None)

• Error: Returns (None, None, error_message)

Database Operations:

  • View: v_company_lookup (main search)
  • Table: db_companies_workspace (workspace lookup)
  • Queries: SELECT with LIMIT 1 (returns first match)
  • Connection: Uses shared pool, always released

4. Processing Push Functions

4.1 People Push (people_push.py)

Purpose:

Creates new person records in db_people and identifiers in db_people_identifiers.

create_person_record(contact_data, http_client)

Logic Flow:

  1. 1. Map Fields for Insertion - Extract non-null values for names, demographics, birth, location, LinkedIn, career, etc.
  2. 2. Validate Data - Return error if no fields to insert
  3. 3. Build INSERT Query - Parameterized query with RETURNING *
  4. 4. Execute INSERT - Insert into db_people, commit, extract person_id
  5. 5. Track Identifiers - Collect contact_linkedin → 'linkedin', contact_xing → 'xing'
  6. 6. Batch Insert Identifiers - Use executemany for db_people_identifiers
  7. 7. Return - (person_id, None) on success

Database Operations:

  • Tables: db_people (main), db_people_identifiers (identifiers)
  • Queries: INSERT with RETURNING, Batch INSERT for identifiers
  • Transaction: Committed after each operation

4.2 Lead Push (lead_push.py)

Purpose:

Creates new lead records in db_leads connecting person to company, with email identifiers.

create_lead_record(contact_data, person_id, company_main_id, http_client)

Logic Flow:

  1. 1. Map Regular Fields - Position, classification, timeline, summary
  2. 2. Map Array Fields - lead_sources (convert string to array if needed)
  3. 3. Build INSERT Query - Include people_id and companies_main_id references
  4. 4. Execute INSERT - Insert into db_leads, commit, extract lead_id
  5. 5. Track Email Identifiers - For each email field, determine status (valid, catch_all, invalid, unsure)
  6. 6. Batch Insert Emails - Use executemany for db_leads_identifiers with status
  7. 7. Return - (lead_id, None) on success

Database Operations:

  • Tables: db_leads (main), db_leads_identifiers (emails)
  • Queries: INSERT with RETURNING, Batch INSERT for emails
  • Transaction: Committed after each operation

4.3 People Workspace Push (people_workspace_push.py)

Purpose:

Creates workspace connections for people in db_people_workspace.

create_people_workspace_connection(people_id, workspace_id, http_client)

Creates people-workspace connection record.

Query:

INSERT INTO db_people_workspace (people_id, workspace_id) VALUES (%s, %s) RETURNING id

Return: Tuple (people_workspace_id, error_message)

4.4 Lead Workspace Push (lead_workspace_push.py)

Purpose:

Creates workspace connections for leads in db_leads_workspace with optional qualification.

create_lead_workspace_connection(lead_id, workspace_id, lead_qualified_ws, http_client)

Creates lead-workspace connection with optional qualification status.

Logic:

  • • Always include: leads_id, workspace_id
  • • If lead_qualified_ws provided: Add to fields and params
  • • Build dynamic INSERT query with RETURNING id

Return: Tuple (lead_workspace_id, error_message)

4.5 Company Workspace Push (company_workspace_push.py)

Purpose:

Creates workspace connections for companies in db_companies_workspace.

push_company_workspace(company_data, workspace_id, company_main_id)

Creates company-workspace connection with optional qualification and tags.

Fields:

  • • Always: workspaces_id, companies_main_id
  • • Optional: company_qualified (boolean)
  • • Optional: company_custom_tags_ws (text[] - converted from comma-separated string)

Return: Tuple (company_workspace_id, error_message)

4.6 Company Push (company_push.py)

Purpose:

Creates new company records in db_companies_main and identifiers in db_companies_dt_identifiers.

push_company(company_data)

Logic Flow:

  1. 1. Define Identifier Mappings - name, domain, linkedin, email, phone, social media
  2. 2. Map Main Table Fields - Company details, address, registration, metrics, media
  3. 3. Map Array Fields - company_tags (text[]), company_sources (data_sources[])
  4. 4. Build INSERT Query - If main fields exist: INSERT with casts (%s::text[], %s::data_sources[]). If only identifiers: INSERT DEFAULT VALUES
  5. 5. Execute Main Insert - Insert into db_companies_main, commit, extract company_id
  6. 6. Collect and Insert Identifiers - Batch INSERT for all identifier types
  7. 7. Return - (company_id, None) on success

Identifier Types Supported:

  • • name
  • • domain
  • • linkedin
  • • email
  • • phone
  • • instagram
  • • facebook
  • • xing
  • • pinterest
  • • tiktok, youtube, twitter

Database Operations:

  • Tables: db_companies_main (main), db_companies_dt_identifiers (identifiers)
  • Queries: INSERT with RETURNING (or DEFAULT VALUES), Batch INSERT for identifiers
  • Transaction: Committed after each operation

5. Processing Update Functions

Non-Destructive Update Strategy: All update functions use additive operations. Regular fields are updated only if new value provided. Array fields append new items using array_cat() or || operator. Identifiers are only inserted (never deleted). Existing data is always preserved.

5.1 People Update (people_update.py)

Purpose:

Updates existing person records and adds new identifiers (non-destructive).

update_person_identifiers(contact_data, person_id)

Logic Flow:

  1. 1. Map Fields for Update - Same field mappings as people_push
  2. 2. Build UPDATE Query - If fields exist: UPDATE db_people SET field1 = %s, ... WHERE id = %s
  3. 3. Fetch Existing Identifiers - SELECT from db_people_identifiers
  4. 4. Identify New Identifiers - Check if (identifier, type) NOT in existing set
  5. 5. Batch Insert New Identifiers - Use executemany
  6. 6. Return - (fields_updated: bool, identifiers_added: int, error_message)

Non-Destructive: Keeps current field values if new value not provided. Only adds new identifiers, never deletes existing ones.

5.2 Lead Update (lead_update.py)

Purpose:

Updates existing lead records and adds new email identifiers (non-destructive).

update_lead_identifiers(contact_data, lead_id)

Logic Flow:

  1. 1. Fetch Existing Record - Get lead_sources array
  2. 2. Map Regular Fields - Same as lead_push
  3. 3. Handle Array Field - Find new items not in existing lead_sources, use array_cat() to append
  4. 4. Execute UPDATE - Update fields and append to arrays
  5. 5. Fetch Existing Email Identifiers - Build dict of (identifier, type) → status
  6. 6. Process Email Identifiers - New emails → inserts list, Changed status → updates list
  7. 7. Batch Operations - INSERT new emails, UPDATE changed statuses
  8. 8. Return - (fields_updated: bool, identifiers_added: int, error_message)

Non-Destructive: Regular fields keep current values if new not provided. Arrays append new items. Email identifiers are added or status updated (never deleted).

5.3 Lead Workspace Update (lead_workspace_update.py)

Purpose:

Updates lead workspace connection fields in db_leads_workspace.

update_lead_workspace(lead_workspace_id, lead_qualified_ws)

Updates qualification status for lead workspace connection.

Query:

UPDATE db_leads_workspace SET lead_qualified_ws = %s WHERE id = %s

Return: Tuple (workspace_updated: bool, error_message)

5.4 People Workspace Update (people_workspace_update.py)

Purpose:

Updates people workspace connection fields (currently no updateable fields).

update_people_workspace(contact_data, people_workspace_id)

Placeholder for future workspace field updates. Currently returns success immediately.

Note: db_people_workspace currently only contains (id, people_id, workspace_id). No updateable fields exist. Function exists for consistency and future extensibility.

Return: Tuple (True, None)

5.5 Company Update (company_update.py)

Purpose:

Updates existing company records and adds new identifiers (non-destructive).

update_company_identifiers(company_data, company_main_id)

Logic Flow:

  1. 1. Fetch Existing Record - Get company_tags and company_sources arrays
  2. 2. Map Regular Fields - Same as company_push
  3. 3. Handle Array Fields - Find new items, use || operator with explicit cast (company_tags || %s::text[], company_sources || %s::data_sources[])
  4. 4. Execute UPDATE - Update fields and append to arrays
  5. 5. Fetch Existing Identifiers - Build set of (identifier, type) tuples
  6. 6. Identify New Identifiers - Check all identifier mappings against existing set
  7. 7. Batch Insert New Identifiers - Use executemany
  8. 8. Return - (fields_updated: bool, identifiers_added: int, error_message)

Non-Destructive: Regular fields keep current values. Arrays append new items using || operator with cast. Identifiers are only added, never deleted.

5.6 Company Workspace Update (company_workspace_update.py)

Purpose:

Updates company workspace connection fields (qualification and tags).

update_company_workspace(company_data, company_workspace_id)

Updates workspace qualification and appends custom tags (non-destructive).

Logic:

  1. 1. Fetch existing company_custom_tags_ws array
  2. 2. If company_qualified provided: Add to SET clause
  3. 3. If company_custom_tags_ws provided: Find new tags, append using || operator with ::text[] cast
  4. 4. Execute UPDATE if fields exist
  5. 5. Return (workspace_updated: bool, error_message)

Non-Destructive: company_qualified updates to new value. company_custom_tags_ws appends new tags without removing existing ones.

Summary of Patterns and Best Practices

Connection Management Pattern

All functions follow this pattern:

conn = None
cursor = None
try:
    conn = get_pg_connection()
    cursor = conn.cursor(cursor_factory=RealDictCursor)
    # ... database operations ...
    conn.commit()
    cursor.close()
    release_pg_connection(conn)
    return (result, None)
except psycopg2.Error as e:
    if conn:
        conn.rollback()
        release_pg_connection(conn)
    return (None, error_message)
finally:
    # Cleanup in except blocks, not finally

Non-Destructive Update Strategy

All update functions use additive operations:

  • Regular fields: Update only if new value provided
  • Array fields: Append new items using array_cat() or || operator
  • Identifiers: Only insert new ones, never delete existing
  • Example: If existing tags = ["tag1"], input tags = ["tag2"], result = ["tag1", "tag2"]

Error Handling Strategy

Cleaning Functions:

  • • Try/except around URL parsing
  • • Return None for invalid data
  • • Never raise exceptions

Lookup Functions:

  • • Return None for not found (not an error)
  • • Return error message for database failures
  • • Always release connections

Push/Update Functions:

  • • Rollback transactions on error
  • • Return tuple (result, error_message)
  • • Always release connections
  • • Cascade returns (partial success allowed)

Parameterized Queries

All database queries use parameterized placeholders:

  • • Prevents SQL injection
  • • Handles special characters
  • • Example: WHERE id = %s, params: (id_value,)

Batch Operations

Use executemany for multiple inserts/updates:

  • • More efficient than multiple execute() calls
  • • Used for identifiers (people, leads, companies)
  • • Used for email identifiers with status
cursor.executemany(query, [(data1,), (data2,), ...])

Auth

General Structure