API Documentation

Complete API reference for Grundwerk Digital platform integration. Build powerful integrations using our RESTful API.

v1.0REST API

Base URL: https://your-domain.com

Authentication

All API endpoints require authentication using an API key. Include your API key in the request header with each request.

Quick Start

API Key Header

api_key: {{your_api_key}}

Base URL

All API requests should be made to the following base URL:

Production Environment

https://web-production-603a8.up.railway.app

Companies

Manage company records in your database. Create, read, update, and delete company information including business details, contact information, and related metadata.

GET/company_get

Get all companie records.

Required: The following fields are mandatory

company_id

Query Parameters

company_idrequired

uuid

The unique identifier of the company

Example: 00000000-0000-0000-0000-000000000000

workspace_idclient_data

uuid

The workspace identifier

Example: 00000000-0000-0000-0000-000000000000

GET

curl -X GET "https://web-production-603a8.up.railway.app/company_get?company_id=00000000-0000-0000-0000-000000000000&workspace_id=00000000-0000-0000-0000-000000000000"
  -H "api_key: {{your_api_key}}"
  -H "Content-Type: application/json"

Response Parameters

company_name_cleanedstring

Cleaned/standardized version of the company name

Example: Example Company GmbH

company_legal_formstring

Legal form of the company (e.g., GmbH, AG, LLC)

Example: GmbH

b2b_b2cstring

Business model classification

Example: B2B

company_name_imprintstring

Company name as it appears in official imprint/legal notices

Example: Example Company GmbH

company_streetstring

Street name of company address

Example: Hauptstraße

company_street_nrstring

Street number of company address

Example: 123

company_citystring

City where company is located

Example: Berlin

company_zipstring

Postal/ZIP code

Example: 10115

company_regionstring

Region/state where company is located

Example: Berlin

company_countrystring

Country where company is located

Example: Germany

company_tax_nrstring

Tax identification number (Steuernummer)

Example: 12/345/67890

company_vat_nrstring

VAT identification number (Umsatzsteuer-ID)

Example: DE123456789

company_handels_registerstring

Commercial register number (Handelsregisternummer)

Example: HRB 12345

company_employees_researchstring

Employee count from research

Example: 50-100

company_employees_linkedinstring

Employee count from LinkedIn

Example: 75

company_founded_yearstring

Year the company was founded

Example: 2010

company_descriptionstring

Company description/about text

Example: Leading provider of software solutions

company_logo_urlstring

URL to company logo image

Example: https://example.com/logo.png

company_size_linkedinstring

Company size range from LinkedIn

Example: 51-200 employees

company_linkedin_followersstring

Number of LinkedIn followers

Example: 1234

company_tagsarray

Array of tags/categories for the company

Example: ["tech","software","b2b"]

company_sourcesarray

Array of data sources

Example: ["website","linkedin","research"]

db_companies_main_created_atstring

Timestamp when company record was created (ISO 8601 format)

Example: 2024-01-15T10:30:00Z

db_companies_main_updated_atstring

Timestamp when company record was last updated (ISO 8601 format)

Example: 2024-03-20T14:45:00Z

company_namesarray

All company names associated with this company. Always present (empty array if none)

Example: ["Example Company","Example Co.","Example GmbH"]

company_domainsarray

All domain names associated with this company. Always present (empty array if none)

Example: ["example.com","example.de"]

company_emailsarray

All email addresses associated with this company. Always present (empty array if none)

Example: ["info@example.com","contact@example.com"]

company_phonesarray

All phone numbers associated with this company. Always present (empty array if none)

Example: ["+49301234567","+49301234568"]

company_linkedinsarray

All LinkedIn URLs associated with this company. Always present (empty array if none)

Example: ["https://linkedin.com/company/example"]

company_instagramsarray

All Instagram URLs associated with this company. Always present (empty array if none)

Example: ["https://instagram.com/example"]

company_facebooksarray

All Facebook URLs associated with this company. Always present (empty array if none)

Example: ["https://facebook.com/example"]

company_xingsarray

All Xing URLs associated with this company. Always present (empty array if none)

Example: []

company_pinterestsarray

All Pinterest URLs associated with this company. Always present (empty array if none)

Example: []

company_tiktoksarray

All TikTok URLs associated with this company. Always present (empty array if none)

Example: []

company_youtubesarray

All YouTube URLs associated with this company. Always present (empty array if none)

Example: ["https://youtube.com/@example"]

company_twittersarray

All Twitter/X URLs associated with this company. Always present (empty array if none)

Example: ["https://twitter.com/example"]

company_workspace_iduuid

UUID of the workspace connection. null if workspace_id not provided in request OR workspace connection doesn't exist

Example: 00000000-0000-0000-0000-000000000000

company_qualifiedstring

Qualification status in workspace. null if not set or no workspace connection

Example: yes

company_custom_tags_wsarray

Custom workspace-specific tags. null if not set or no workspace connection

Example: ["priority","enterprise"]

db_companies_workspace_created_atstring

Timestamp when workspace connection was created (ISO 8601 format). null if no workspace connection

Example: 2024-01-20T11:00:00Z

db_companies_workspace_updated_atstring

Timestamp when workspace connection was last updated (ISO 8601 format). null if no workspace connection

Example: 2024-03-22T16:30:00Z

Response

{
  "company_name_cleaned": "Example Company GmbH",
  "company_legal_form": "GmbH",
  "b2b_b2c": "B2B",
  "company_name_imprint": "Example Company GmbH",
  "company_street": "Hauptstraße",
  "company_street_nr": "123",
  "company_city": "Berlin",
  "company_zip": "10115",
  "company_region": "Berlin",
  "company_country": "Germany",
  "company_tax_nr": "12/345/67890",
  "company_vat_nr": "DE123456789",
  "company_handels_register": "HRB 12345",
  "company_employees_research": "50-100",
  "company_employees_linkedin": "75",
  "company_founded_year": "2010",
  "company_description": "Leading provider of software solutions",
  "company_logo_url": "https://example.com/logo.png",
  "company_size_linkedin": "51-200 employees",
  "company_linkedin_followers": "1234",
  "company_tags": [
    "tech",
    "software",
    "b2b"
  ],
  "company_sources": [
    "website",
    "linkedin",
    "research"
  ],
  "db_companies_main_created_at": "2024-01-15T10:30:00Z",
  "db_companies_main_updated_at": "2024-03-20T14:45:00Z",
  "company_names": [
    "Example Company",
    "Example Co.",
    "Example GmbH"
  ],
  "company_domains": [
    "example.com",
    "example.de"
  ],
  "company_emails": [
    "info@example.com",
    "contact@example.com"
  ],
  "company_phones": [
    "+49301234567",
    "+49301234568"
  ],
  "company_linkedins": [
    "https://linkedin.com/company/example"
  ],
  "company_instagrams": [
    "https://instagram.com/example"
  ],
  "company_facebooks": [
    "https://facebook.com/example"
  ],
  "company_xings": [],
  "company_pinterests": [],
  "company_tiktoks": [],
  "company_youtubes": [
    "https://youtube.com/@example"
  ],
  "company_twitters": [
    "https://twitter.com/example"
  ],
  "company_workspace_id": "ws-uuid-123",
  "company_qualified": "yes",
  "company_custom_tags_ws": [
    "priority",
    "enterprise"
  ],
  "db_companies_workspace_created_at": "2024-01-20T11:00:00Z",
  "db_companies_workspace_updated_at": "2024-03-22T16:30:00Z"
}

POST/company_lookup

Lookup if Company exists, and push if not exists.

Required: At least one of the following conditions must be met

company_nameORcompany_domainORcompany_linkedin

Request Body

workspace_idclient_data

uuid

The workspace identifier

Example: 00000000-0000-0000-0000-000000000000

company_nameat least 1 required

string

The name of the company to lookup

Example: Acme Corporation

company_domainat least 1 required

string

The domain of the company to lookup

Example: acme.com

company_linkedinat least 1 required

string

The LinkedIn URL of the company to lookup

Example: https://linkedin.com/company/acme

POST

curl -X POST "https://web-production-603a8.up.railway.app/company_lookup"
  -H "api_key: {{your_api_key}}"
  -H "Content-Type: application/json"
  -d '{
  "workspace_id": "00000000-0000-0000-0000-000000000000",
  "company_name": "Acme Corporation",
  "company_domain": "acme.com",
  "company_linkedin": "https://linkedin.com/company/acme"
}'

Response Parameters

company_main_iduuid

UUID of the company in the db_companies_main table. Returns UUID if company found, null if not found.

Example: 00000000-0000-0000-0000-000000000000

company_workspace_iduuid

UUID of the company-workspace connection in db_companies_workspace table. Only populated if workspace_id was provided in request. Returns UUID if workspace connection exists, null if not found.

Example: 00000000-0000-0000-0000-000000000000

errorstring

Error message if lookup failed. null if no error occurred.

Response

{
  "company_main_id": "283f9e1f-89dd-4032-8e02-65acc6856ed1",
  "company_workspace_id": "abc-123-def-456",
  "error": null
}

POST/company_push

Push new company data to database.

Required: At least one of the following conditions must be met

company_nameORcompany_domainORcompany_linkedin

Request Body

workspace_idclient_data

uuid

The workspace identifier

Example: 00000000-0000-0000-0000-000000000000

company_nameat least 1 required

string

Company name

Example: Acme Corporation

company_name_cleanedoptional

string

Cleaned company name

Example: Acme Corporation

company_domainat least 1 required

string

Company domain

Example: acme.com

company_linkedinat least 1 required

string

Company LinkedIn URL

Example: https://linkedin.com/company/acme

company_emailoptional

string

Company email address

Example: info@acme.com

company_phoneoptional

string

Company phone number

Example: +1234567890

company_instagramoptional

string

Company Instagram URL

Example: https://instagram.com/acme

company_facebookoptional

string

Company Facebook URL

Example: https://facebook.com/acme

company_xingoptional

string

Company XING URL

Example: https://xing.com/companies/acme

company_pinterestoptional

string

Company Pinterest URL

Example: https://pinterest.com/acme

company_tiktokoptional

string

Company TikTok URL

Example: https://tiktok.com/@acme

company_youtubeoptional

string

Company YouTube URL

Example: https://youtube.com/@acme

company_twitteroptional

string

Company Twitter/X URL

Example: https://twitter.com/acme

company_legal_formoptional

string

Company legal form

Example: Inc.

b2b_b2coptional

enum: b2b_b2c_type

Business model (B2B or B2C)

Example: B2B

Allowed values:

B2BB2Cboth

company_imprint_nameoptional

string

Company imprint name

Example: Acme Corporation Inc.

company_streetoptional

string

Company street address

Example: Main Street

company_street_nroptional

string

Company street number

Example: 123

company_cityoptional

string

Company city

Example: München

company_zipoptional

string

Company zip code

Example: 10001

company_regionoptional

string

Company region/state

Example: Bayern

company_countryoptional

string

Company country

Example: Germany

company_steuer_nroptional

string

Company tax number

Example: 30/321/50964

company_vat_nroptional

string

Company VAT number

Example: DE123456789

company_register_nroptional

string

Company registration number

Example: HRB 12345

employees_researchoptional

int8

Number of employees from research

Example: 75

employees_linkedinoptional

int8

Number of employees from LinkedIn

Example: 75

company_founded_yearoptional

int8

Year company was founded

Example: 2010

descriptionoptional

string

Company description

Example: Leading provider of innovative solutions

company_logo_urloptional

string

Company logo URL

Example: https://acme.com/logo.png

company_size_linkedinoptional

string

Company size from LinkedIn

Example: 51-200

company_linkedin_followersoptional

int8

Number of LinkedIn followers

Example: 1500

company_tagsoptional

enum: textarray

Company tags

Example: ["technology","saas"]

company_sourcesoptional

enum: data_sourcesarray

Data sources

Example: ["linkedin","website"]

Allowed values:

lushaclayapollonorth_datad7_lead_finderstoreleadsbuild_withsales_navigator

company_qualifiedoptional

enum: pending_boolean

Qualification status

Example: qualified

Allowed values:

qualifiedpendingnot_qualified

company_custom_tags_wsoptional

enum: textarray

Custom workspace tags

Example: ["vip","partner"]

POST

curl -X POST "https://web-production-603a8.up.railway.app/company_push"
  -H "api_key: {{your_api_key}}"
  -H "Content-Type: application/json"
  -d '{
  "workspace_id": "00000000-0000-0000-0000-000000000000",
  "company_name": "Acme Corporation",
  "company_name_cleaned": "Acme Corporation",
  "company_domain": "acme.com",
  "company_linkedin": "https://linkedin.com/company/acme",
  "company_email": "info@acme.com",
  "company_phone": "+1234567890",
  "company_instagram": "https://instagram.com/acme",
  "company_facebook": "https://facebook.com/acme",
  "company_xing": "https://xing.com/companies/acme",
  "company_pinterest": "https://pinterest.com/acme",
  "company_tiktok": "https://tiktok.com/@acme",
  "company_youtube": "https://youtube.com/@acme",
  "company_twitter": "https://twitter.com/acme",
  "company_legal_form": "Inc.",
  "b2b_b2c": "B2B",
  "company_imprint_name": "Acme Corporation Inc.",
  "company_street": "Main Street",
  "company_street_nr": "123",
  "company_city": "München",
  "company_zip": "10001",
  "company_region": "Bayern",
  "company_country": "Germany",
  "company_steuer_nr": "30/321/50964",
  "company_vat_nr": "DE123456789",
  "company_register_nr": "HRB 12345",
  "employees_research": 75,
  "employees_linkedin": 75,
  "company_founded_year": 2010,
  "description": "Leading provider of innovative solutions",
  "company_logo_url": "https://acme.com/logo.png",
  "company_size_linkedin": "51-200",
  "company_linkedin_followers": 1500,
  "company_tags": [
    "technology",
    "saas"
  ],
  "company_sources": [
    "linkedin",
    "website"
  ],
  "company_qualified": "qualified",
  "company_custom_tags_ws": [
    "vip",
    "partner"
  ]
}'

Response Parameters

company_main_iduuid

UUID of the company (either found or newly created). null only if operation failed.

Example: 00000000-0000-0000-0000-000000000000

company_workspace_iduuid

UUID of the company-workspace connection (either found or newly created). Only populated if workspace_id was provided in request. null if workspace operation not performed or failed.

Example: 00000000-0000-0000-0000-000000000000

status_companyenum

Status of the company record operation. Can be "found" (company already existed), "created" (new company created), or null (operation failed).

Example: created

Possible values:

foundcreated

status_company_workspaceenum

Status of the workspace connection operation. Can be "found" (workspace connection already existed), "created" (new workspace connection created), or null (not performed or failed). Only relevant when workspace_id provided in request.

Example: created

Possible values:

foundcreated

errorstring

Error message if any operation failed. null if all operations succeeded.

Response

{
  "company_main_id": "283f9e1f-89dd-4032-8e02-65acc6856ed1",
  "company_workspace_id": "abc-123-def-456",
  "status_company": "created",
  "status_company_workspace": "created",
  "error": null
}

POST/company_push_patch

Update existing company data in database.

Required: At least one of the following conditions must be met

company_nameORcompany_domainORcompany_linkedin

Request Body

workspace_idclient_data

uuid

The workspace identifier

Example: 00000000-0000-0000-0000-000000000000

company_nameat least 1 required

string

Company name

Example: Acme Corporation

company_name_cleanedoptional

string

Cleaned company name

Example: Acme Corporation

company_domainat least 1 required

string

Company domain

Example: acme.com

company_linkedinat least 1 required

string

Company LinkedIn URL

Example: https://linkedin.com/company/acme

company_emailoptional

string

Company email address

Example: info@acme.com

company_phoneoptional

string

Company phone number

Example: +1234567890

company_instagramoptional

string

Company Instagram URL

Example: https://instagram.com/acme

company_facebookoptional

string

Company Facebook URL

Example: https://facebook.com/acme

company_xingoptional

string

Company XING URL

Example: https://xing.com/companies/acme

company_pinterestoptional

string

Company Pinterest URL

Example: https://pinterest.com/acme

company_tiktokoptional

string

Company TikTok URL

Example: https://tiktok.com/@acme

company_youtubeoptional

string

Company YouTube URL

Example: https://youtube.com/@acme

company_twitteroptional

string

Company Twitter/X URL

Example: https://twitter.com/acme

company_legal_formoptional

string

Company legal form

Example: Inc.

b2b_b2coptional

enum: b2b_b2c_type

Business model (B2B or B2C)

Example: B2B

Allowed values:

B2BB2Cboth

company_imprint_nameoptional

string

Company imprint name

Example: Acme Corporation Inc.

company_streetoptional

string

Company street address

Example: Main Street

company_street_nroptional

string

Company street number

Example: 123

company_cityoptional

string

Company city

Example: München

company_zipoptional

string

Company zip code

Example: 10001

company_regionoptional

string

Company region/state

Example: Bayern

company_countryoptional

string

Company country

Example: Germany

company_steuer_nroptional

string

Company tax number

Example: 30/321/50964

company_vat_nroptional

string

Company VAT number

Example: DE123456789

company_register_nroptional

string

Company registration number

Example: HRB 12345

employees_researchoptional

int8

Number of employees from research

Example: 75

employees_linkedinoptional

int8

Number of employees from LinkedIn

Example: 75

company_founded_yearoptional

int8

Year company was founded

Example: 2010

descriptionoptional

string

Company description

Example: Leading provider of innovative solutions

company_logo_urloptional

string

Company logo URL

Example: https://acme.com/logo.png

company_size_linkedinoptional

string

Company size from LinkedIn

Example: 51-200

company_linkedin_followersoptional

int8

Number of LinkedIn followers

Example: 1500

company_tagsoptional

enum: textarray

Company tags

Example: ["technology","saas"]

company_sourcesoptional

enum: data_sourcesarray

Data sources

Example: ["linkedin","website"]

Allowed values:

lushaclayapollonorth_datad7_lead_finderstoreleadsbuild_withsales_navigator

company_qualifiedoptional

enum: pending_boolean

Qualification status

Example: qualified

Allowed values:

qualifiedpendingnot_qualified

company_custom_tags_wsoptional

enum: textarray

Custom workspace tags

Example: ["vip","partner"]

POST

curl -X POST "https://web-production-603a8.up.railway.app/company_push_patch"
  -H "api_key: {{your_api_key}}"
  -H "Content-Type: application/json"
  -d '{
  "workspace_id": "00000000-0000-0000-0000-000000000000",
  "company_name": "Acme Corporation",
  "company_name_cleaned": "Acme Corporation",
  "company_domain": "acme.com",
  "company_linkedin": "https://linkedin.com/company/acme",
  "company_email": "info@acme.com",
  "company_phone": "+1234567890",
  "company_instagram": "https://instagram.com/acme",
  "company_facebook": "https://facebook.com/acme",
  "company_xing": "https://xing.com/companies/acme",
  "company_pinterest": "https://pinterest.com/acme",
  "company_tiktok": "https://tiktok.com/@acme",
  "company_youtube": "https://youtube.com/@acme",
  "company_twitter": "https://twitter.com/acme",
  "company_legal_form": "Inc.",
  "b2b_b2c": "B2B",
  "company_imprint_name": "Acme Corporation Inc.",
  "company_street": "Main Street",
  "company_street_nr": "123",
  "company_city": "München",
  "company_zip": "10001",
  "company_region": "Bayern",
  "company_country": "Germany",
  "company_steuer_nr": "30/321/50964",
  "company_vat_nr": "DE123456789",
  "company_register_nr": "HRB 12345",
  "employees_research": 75,
  "employees_linkedin": 75,
  "company_founded_year": 2010,
  "description": "Leading provider of innovative solutions",
  "company_logo_url": "https://acme.com/logo.png",
  "company_size_linkedin": "51-200",
  "company_linkedin_followers": 1500,
  "company_tags": [
    "technology",
    "saas"
  ],
  "company_sources": [
    "linkedin",
    "website"
  ],
  "company_qualified": "qualified",
  "company_custom_tags_ws": [
    "vip",
    "partner"
  ]
}'

Response Parameters

company_main_iduuid

UUID of the company (found, created, or updated). null only if operation failed.

Example: 00000000-0000-0000-0000-000000000000

company_workspace_iduuid

UUID of the company-workspace connection (found, created, or updated). Only populated if workspace_id was provided in request. null if workspace operation not performed or failed.

Example: 00000000-0000-0000-0000-000000000000

status_companyenum

Status of the company record operation. Can be "created" (new company created), "updated" (existing company updated), or null (operation failed). Note: Unlike /company_push, this endpoint never returns "found" - it always updates if found.

Example: updated

Possible values:

createdupdated

status_company_workspaceenum

Status of the workspace connection operation. Can be "created" (new workspace connection created), "updated" (existing workspace connection updated), or null (not performed or failed). Only relevant when workspace_id provided in request. Note: Always updates if workspace connection exists.

Example: updated

Possible values:

createdupdated

errorstring

Error message if any operation failed. null if all operations succeeded.

Response

{
  "company_main_id": "283f9e1f-89dd-4032-8e02-65acc6856ed1",
  "company_workspace_id": "abc-123-def-456",
  "status_company": "updated",
  "status_company_workspace": "updated",
  "error": null
}

POST/company_delete_fields

Create a new companie record.

Required: The following fields are mandatory

company_id

Request Body

company_idrequired

uuid

The unique identifier of the company

Example: 00000000-0000-0000-0000-000000000000

company_workspace_idclient_data

uuid

The workspace identifier

Example: 00000000-0000-0000-0000-000000000000

company_nameoptional

string

Company name to delete

Example: https://linkedin.com/company/acme

company_domainoptional

string

Company domain to delete

Example: acme.com

company_linkedinoptional

string

Company LinkedIn URL to delete

Example: https://linkedin.com/company/acme

company_emailoptional

string

Company email address to delete

Example: info@acme.com

company_phoneoptional

string

Company phone number to delete

Example: +491234567890

company_instagramoptional

string

Company Instagram URL to delete

Example: https://instagram.com/acme

company_facebookoptional

string

Company Facebook URL to delete

Example: https://facebook.com/acme

company_xingoptional

string

Company XING URL to delete

Example: https://xing.com/companies/acme

company_pinterestoptional

string

Company Pinterest URL to delete

Example: https://pinterest.com/acme

company_tiktokoptional

string

Company TikTok URL to delete

Example: https://tiktok.com/@acme

company_youtubeoptional

string

Company YouTube URL to delete

Example: https://youtube.com/@acme

company_twitteroptional

string

Company Twitter/X URL to delete

Example: https://twitter.com/acme

company_name_cleanedoptional

boolean

Set to true to delete the cleaned company name field, false to keep it

Example: false

Possible values:truefalse

company_legal_formoptional

boolean

Set to true to delete the company legal form field, false to keep it

Example: false

Possible values:truefalse

b2b_b2coptional

boolean

Set to true to delete the business model field, false to keep it

Example: false

Possible values:truefalse

company_imprint_nameoptional

boolean

Set to true to delete the company imprint name field, false to keep it

Example: false

Possible values:truefalse

company_streetoptional

boolean

Set to true to delete the company street field, false to keep it

Example: false

Possible values:truefalse

company_street_nroptional

boolean

Set to true to delete the company street number field, false to keep it

Example: false

Possible values:truefalse

company_cityoptional

boolean

Set to true to delete the company city field, false to keep it

Example: false

Possible values:truefalse

company_zipoptional

boolean

Set to true to delete the company zip code field, false to keep it

Example: false

Possible values:truefalse

company_regionoptional

boolean

Set to true to delete the company region field, false to keep it

Example: false

Possible values:truefalse

company_countryoptional

boolean

Set to true to delete the company country field, false to keep it

Example: false

Possible values:truefalse

company_steuer_nroptional

boolean

Set to true to delete the company tax number field, false to keep it

Example: false

Possible values:truefalse

company_vat_nroptional

boolean

Set to true to delete the company VAT number field, false to keep it

Example: false

Possible values:truefalse

company_register_nroptional

boolean

Set to true to delete the company registration number field, false to keep it

Example: false

Possible values:truefalse

employees_researchoptional

boolean

Set to true to delete the employees research field, false to keep it

Example: false

Possible values:truefalse

employees_linkedinoptional

boolean

Set to true to delete the employees LinkedIn field, false to keep it

Example: false

Possible values:truefalse

company_founded_yearoptional

boolean

Set to true to delete the company founded year field, false to keep it

Example: false

Possible values:truefalse

descriptionoptional

boolean

Set to true to delete the company description field, false to keep it

Example: false

Possible values:truefalse

company_logo_urloptional

boolean

Set to true to delete the company logo URL field, false to keep it

Example: false

Possible values:truefalse

company_size_linkedinoptional

boolean

Set to true to delete the company size LinkedIn field, false to keep it

Example: false

Possible values:truefalse

company_linkedin_followersoptional

boolean

Set to true to delete the LinkedIn followers field, false to keep it

Example: false

Possible values:truefalse

company_tagsoptional

enum: textarray

Array of company tag values to delete from the database array. All inputted values in the array will be removed from the table array field

Example: ["technology","saas"]

company_sourcesoptional

enum: data_sourcesarray

Array of data source values to delete from the database array. All inputted values in the array will be removed from the table array field

Example: ["linkedin","apollo"]

Allowed values:

lushaclayapollonorth_datad7_lead_finderstoreleadsbuild_withsales_navigator

company_qualifiedoptional

boolean

Set to true to delete the qualification status field, false to keep it

Example: false

Possible values:truefalse

company_custom_tags_wsoptional

enum: textarray

Array of custom workspace tag values to delete from the database array. All inputted values in the array will be removed from the table array field

Example: ["vip","partner"]

POST

curl -X POST "https://web-production-603a8.up.railway.app/company_delete_fields"
  -H "api_key: {{your_api_key}}"
  -H "Content-Type: application/json"
  -d '{
  "company_id": "00000000-0000-0000-0000-000000000000",
  "company_workspace_id": "00000000-0000-0000-0000-000000000000",
  "company_name": "https://linkedin.com/company/acme",
  "company_domain": "acme.com",
  "company_linkedin": "https://linkedin.com/company/acme",
  "company_email": "info@acme.com",
  "company_phone": "+491234567890",
  "company_instagram": "https://instagram.com/acme",
  "company_facebook": "https://facebook.com/acme",
  "company_xing": "https://xing.com/companies/acme",
  "company_pinterest": "https://pinterest.com/acme",
  "company_tiktok": "https://tiktok.com/@acme",
  "company_youtube": "https://youtube.com/@acme",
  "company_twitter": "https://twitter.com/acme",
  "company_name_cleaned": false,
  "company_legal_form": false,
  "b2b_b2c": false,
  "company_imprint_name": false,
  "company_street": false,
  "company_street_nr": false,
  "company_city": false,
  "company_zip": false,
  "company_region": false,
  "company_country": false,
  "company_steuer_nr": false,
  "company_vat_nr": false,
  "company_register_nr": false,
  "employees_research": false,
  "employees_linkedin": false,
  "company_founded_year": false,
  "description": false,
  "company_logo_url": false,
  "company_size_linkedin": false,
  "company_linkedin_followers": false,
  "company_tags": [
    "technology",
    "saas"
  ],
  "company_sources": [
    "linkedin",
    "apollo"
  ],
  "company_qualified": false,
  "company_custom_tags_ws": [
    "vip",
    "partner"
  ]
}'

Response Parameters

successboolean

Indicates whether the operation completed successfully. true = operation completed (even if no fields were deleted), false = operation failed due to error.

Example: true

messagestring

Detailed message describing what operations were performed. Success format: "Successfully completed: <list of operations>". Message may contain: "Set N fields to NULL" (boolean fields set to NULL in db_companies_main), "Removed N items from company_tags" (tags removed from company_tags array), "Removed N items from company_sources" (sources removed from company_sources array), "Deleted N identifier records" (identifier records deleted from db_companies_dt_identifiers), "Workspace: Set company_qualified to NULL" (workspace field set to NULL), "Workspace: Removed N custom tags" (custom tags removed from workspace).

Example:

Successfully completed: Set 3 fields to NULL; Removed 2 items from company_tags; Deleted 1 identifier records; Workspace: Set company_qualified to NULL, Removed 2 custom tags

Response

{
  "success": true,
  "message": "Successfully completed: Set 3 fields to NULL; Removed 2 items from company_tags; Deleted 1 identifier records; Workspace: Set company_qualified to NULL, Removed 2 custom tags"
}

People

Manage people and contact records. Create, read, update, and delete person information including names, email addresses, phone numbers, and associated company relationships.

GET/contact_get

Get all peopl records.

Required: The following fields are mandatory

lead_id

Query Parameters

lead_idrequired

uuid

The unique identifier of the lead/contact

Example: 00000000-0000-0000-0000-000000000000

workspace_idclient_data

uuid

The workspace identifier

Example: 00000000-0000-0000-0000-000000000000

company_id

uuid

Company identifier

Example: 00000000-0000-0000-0000-000000000000

GET

curl -X GET "https://web-production-603a8.up.railway.app/contact_get?lead_id=00000000-0000-0000-0000-000000000000&workspace_id=00000000-0000-0000-0000-000000000000&company_id=00000000-0000-0000-0000-000000000000"
  -H "api_key: {{your_api_key}}"
  -H "Content-Type: application/json"

Response Parameters

people_idstring

Returns person-uuid-456

companies_main_idstring

Returns company-uuid-789

lead_positionstring

Returns Chief Executive Officer

lead_position_cleanedstring

Returns CEO

lead_senioritystring

Returns C-Level

lead_departementstring

Returns Management

still_at_companystring

Returns yes

lead_start_datestring

Returns 2020-01-15

lead_end_dateobject

Object containing nested data

lead_seniority_enumstring

Returns c_level

lead_departement_enumstring

Returns management

lead_position_clean_plural_dativstring

Returns CEOs

lead_position_clean_plural_nominativstring

Returns CEOs

lead_summarystring

Returns Experienced executive with 15 years in tech

lead_sourcesarray

Array of values

db_leads_created_atstring

Returns 2024-02-10T09:15:00Z

db_leads_updated_atstring

Returns 2024-03-25T11:20:00Z

person_first_namestring

Returns John

contact_first_name_cleanedstring

Returns John

person_last_namestring

Returns Doe

contact_last_name_cleanedstring

Returns Doe

person_genderstring

Returns male

person_languagestring

Returns de

contact_estimated_birth_yearstring

Returns 1980

contact_birth_yearstring

Returns 1982

contact_birth_datestring

Returns 1982-05-15

person_countrystring

Returns Germany

person_citystring

Returns Berlin

linkedin_cvstring

Returns Extensive experience in software development and leadership

linkedin_volunteeringsstring

Returns Board member at Tech for Good

started_education_linkedinstring

Returns 2000

first_job_start_linkedinstring

Returns 2004

contact_locationstring

Returns Berlin, Germany

contact_academic_titlestring

Returns Dr.

person_statestring

Returns Berlin

person_native_germanstring

Returns yes

person_scooling_countrystring

Returns Germany

contact_linkedin_image_urlstring

Returns https://media.linkedin.com/profile.jpg

person_linkedin_followersstring

Returns 2500

person_linkedin_connectionsstring

Returns 500+

db_people_created_atstring

Returns 2024-01-05T08:30:00Z

db_people_updated_atstring

Returns 2024-03-18T10:45:00Z

contact_linkedinsarray

Array of values

contact_xingsarray

Array of values

contact_emails_validarray

Array of values

contact_emails_invalidarray

Array of values

contact_emails_catch_allarray

Array of values

contact_emails_wrongarray

Array of values

contact_emails_unsurearray

Array of values

lead_workspace_idstring

Returns lead-ws-uuid-345

lead_qualified_wsstring

Returns yes

db_leads_workspace_created_atstring

Returns 2024-02-15T13:00:00Z

db_leads_workspace_updated_atstring

Returns 2024-03-26T15:30:00Z

Response

{
  "people_id": "person-uuid-456",
  "companies_main_id": "company-uuid-789",
  "lead_position": "Chief Executive Officer",
  "lead_position_cleaned": "CEO",
  "lead_seniority": "C-Level",
  "lead_departement": "Management",
  "still_at_company": "yes",
  "lead_start_date": "2020-01-15",
  "lead_end_date": null,
  "lead_seniority_enum": "c_level",
  "lead_departement_enum": "management",
  "lead_position_clean_plural_dativ": "CEOs",
  "lead_position_clean_plural_nominativ": "CEOs",
  "lead_summary": "Experienced executive with 15 years in tech",
  "lead_sources": [
    "linkedin",
    "company_website"
  ],
  "db_leads_created_at": "2024-02-10T09:15:00Z",
  "db_leads_updated_at": "2024-03-25T11:20:00Z",
  "person_first_name": "John",
  "contact_first_name_cleaned": "John",
  "person_last_name": "Doe",
  "contact_last_name_cleaned": "Doe",
  "person_gender": "male",
  "person_language": "de",
  "contact_estimated_birth_year": "1980",
  "contact_birth_year": "1982",
  "contact_birth_date": "1982-05-15",
  "person_country": "Germany",
  "person_city": "Berlin",
  "linkedin_cv": "Extensive experience in software development and leadership",
  "linkedin_volunteerings": "Board member at Tech for Good",
  "started_education_linkedin": "2000",
  "first_job_start_linkedin": "2004",
  "contact_location": "Berlin, Germany",
  "contact_academic_title": "Dr.",
  "person_state": "Berlin",
  "person_native_german": "yes",
  "person_scooling_country": "Germany",
  "contact_linkedin_image_url": "https://media.linkedin.com/profile.jpg",
  "person_linkedin_followers": "2500",
  "person_linkedin_connections": "500+",
  "db_people_created_at": "2024-01-05T08:30:00Z",
  "db_people_updated_at": "2024-03-18T10:45:00Z",
  "contact_linkedins": [
    "https://linkedin.com/in/johndoe",
    "https://linkedin.com/in/john-doe"
  ],
  "contact_xings": [
    "https://xing.com/profile/johndoe"
  ],
  "contact_emails_valid": [
    "john@example.com",
    "j.doe@example.com"
  ],
  "contact_emails_invalid": [
    "oldaddress@defunct.com"
  ],
  "contact_emails_catch_all": [
    "info@example.com"
  ],
  "contact_emails_wrong": [],
  "contact_emails_unsure": [
    "john.doe@maybe.com"
  ],
  "lead_workspace_id": "lead-ws-uuid-345",
  "lead_qualified_ws": "yes",
  "db_leads_workspace_created_at": "2024-02-15T13:00:00Z",
  "db_leads_workspace_updated_at": "2024-03-26T15:30:00Z"
}

POST/contact_lookup

Lookup if Lead and Person exists, and push if not exists.

Required: At least one of the following conditions must be met

contact_linkedinORcontact_xingORcontact_email_validORcontact_email_catch_allORcontact_email_invalidORcontact_email_unsureOR(company_idAND(contact_first_nameORcontact_first_name_cleaned)AND(contact_last_nameORcontact_last_name_cleaned))

Request Body

workspace_idclient_data

uuid

The workspace identifier

Example: 00000000-0000-0000-0000-000000000000

company_idat least 1 required

uuid

Company identifier

Example: 00000000-0000-0000-0000-000000000000

contact_linkedinat least 1 required

string

Contact LinkedIn URL

Example: https://linkedin.com/in/max-mueller

contact_xingat least 1 required

string

Contact XING URL

Example: https://xing.com/profile/max-mueller

contact_email_validat least 1 required

string

Valid email address

Example: max.mueller@example.com

contact_email_catch_allat least 1 required

string

Catch-all email address

Example: info@example.com

contact_email_invalidat least 1 required

string

Invalid email address

Example: invalid@example.com

contact_email_unsureat least 1 required

string

Unsure email address

Example: unsure@example.com

contact_first_nameat least 1 required

string

Contact first name

Example: Max

contact_first_name_cleanedat least 1 required

string

Cleaned contact first name

Example: Max

contact_last_nameat least 1 required

string

Contact last name

Example: Müller

contact_last_name_cleanedat least 1 required

string

Cleaned contact last name

Example: Mueller

POST

curl -X POST "https://web-production-603a8.up.railway.app/contact_lookup"
  -H "api_key: {{your_api_key}}"
  -H "Content-Type: application/json"
  -d '{
  "workspace_id": "00000000-0000-0000-0000-000000000000",
  "company_id": "00000000-0000-0000-0000-000000000000",
  "contact_linkedin": "https://linkedin.com/in/max-mueller",
  "contact_xing": "https://xing.com/profile/max-mueller",
  "contact_email_valid": "max.mueller@example.com",
  "contact_email_catch_all": "info@example.com",
  "contact_email_invalid": "invalid@example.com",
  "contact_email_unsure": "unsure@example.com",
  "contact_first_name": "Max",
  "contact_first_name_cleaned": "Max",
  "contact_last_name": "Müller",
  "contact_last_name_cleaned": "Mueller"
}'

Response Parameters

lead_iduuid

UUID of the lead in db_leads table. Represents the connection between a person and a company. Returns UUID if lead found, null if not found.

Example: 00000000-0000-0000-0000-000000000000

person_iduuid

UUID of the person in db_people table. Returns UUID if person found, null if not found.

Example: 00000000-0000-0000-0000-000000000000

lead_workspace_iduuid

UUID of the lead-workspace connection in db_leads_workspace table. Only populated if workspace_id was provided in request. Returns UUID if workspace connection exists, null if not found.

Example: 00000000-0000-0000-0000-000000000000

people_workspace_iduuid

UUID of the people-workspace connection in db_people_workspace table. Only populated if workspace_id was provided in request. Returns UUID if workspace connection exists, null if not found.

Example: 00000000-0000-0000-0000-000000000000

errorstring

Error message if lookup failed. null if no error occurred.

Response

{
  "lead_id": "lead-uuid-123",
  "person_id": "person-uuid-456",
  "lead_workspace_id": "lead-ws-uuid-789",
  "people_workspace_id": "people-ws-uuid-012",
  "error": null
}

POST/contact_push

Push new contact data to database.

Required: At least one of the following conditions must be met

Request Body

workspace_idclient_data

uuid

The workspace identifier

Example: 00000000-0000-0000-0000-000000000000

company_idat least 1 required

uuid

Company identifier

Example: 00000000-0000-0000-0000-000000000000

contact_email_validat least 1 required

string

Valid email address

Example: max.mueller@example.com

contact_email_catch_allat least 1 required

string

Catch-all email address

Example: info@example.com

contact_email_invalidat least 1 required

string

Invalid email address

Example: invalid@example.com

contact_email_unsureat least 1 required

string

Unsure email address

Example: unsure@example.com

contact_linkedinat least 1 required

string

Contact LinkedIn URL

Example: https://linkedin.com/in/max-mueller

contact_xingat least 1 required

string

Contact XING URL

Example: https://xing.com/profile/max-mueller

contact_first_nameat least 1 required

string

Contact first name

Example: Max

contact_first_name_cleanedat least 1 required

string

Cleaned contact first name

Example: Max

contact_last_nameat least 1 required

string

Contact last name

Example: Müller

contact_last_name_cleanedat least 1 required

string

Cleaned contact last name

Example: Mueller

contact_genderoptional

string

Contact gender

Example: Male

contact_languageoptional

string

Contact language

Example: German

contact_estimated_birth_yearoptional

int2

Estimated birth year

Example: 1990

contact_birth_yearoptional

int2

Birth year

Example: 1990

contact_birth_dateoptional

string

Birth date

Example: 1990-05-15

person_countryoptional

string

Person country

Example: Germany

person_cityoptional

string

Person city

Example: München

linkedin_cvoptional

jsonb

LinkedIn CV data

Example: {"experience":[{"company":"Example GmbH","position":"Sales Manager","duration":"2020-2023"}]}

linkedin_volunteeringsoptional

jsonb

LinkedIn volunteering activities

Example: {"organizations":["Non-Profit Organization"]}

started_education_linkedinoptional

int8

Education start date from LinkedIn

Example: 2010

first_job_start_linkedinoptional

int8

First job start date from LinkedIn

Example: 2015

contact_locationoptional

string

Contact location

Example: München, Bayern, Germany

contact_academic_titleoptional

string

Academic title

Example: Dr.

person_stateoptional

string

Person state/region

Example: Bayern

person_native_germanoptional

string

Native German speaker indicator

Example: true

person_scooling_countryoptional

string

Schooling country

Example: Germany

contact_linkedin_image_urloptional

string

LinkedIn profile image URL

Example: https://media.licdn.com/dms/image/example/profile.jpg

person_linkedin_followersoptional

int8

Number of LinkedIn followers

Example: 500

person_linkedin_connectionsoptional

int8

Number of LinkedIn connections

Example: 300

lead_positionoptional

string

Job position

Example: Sales Manager

lead_position_cleanedoptional

string

Cleaned job position

Example: Sales Manager

lead_seniorityoptional

string

Seniority level

Example: Manager

lead_departementoptional

string

Department

Example: Sales

still_at_companyoptional

bool

Still employed at company

Example: true

Possible values:truefalse

lead_start_dateoptional

date

Position start date

Example: 2020-01-01

lead_end_dateoptional

date

Position end date

Example: 2023-12-31

lead_seniority_enumoptional

enum: contact_seniority

Seniority level enum

Example: manager

Allowed values:

c-levelgeschäftsführungheadmanagerentrydirectorpartnerpresidentintern

lead_departement_enumoptional

enum: contact_departement

Department enum

Example: sales

Allowed values:

marketingsalesgeschäftsführungprocurementlegalaccountingfinance

lead_position_clean_plural_dativoptional

string

Position in plural dativ form

Example: Vertriebsleitern

lead_position_clean_plural_nominativoptional

string

Position in plural nominativ form

Example: Vertriebsleiter

lead_summaryoptional

string

Lead summary

Example: Experienced sales professional with 10+ years in B2B software sales

lead_sourcesoptional

enum: data_sourcesarray

Lead sources

Example: ["apollo","sales_navigator"]

Allowed values:

lushaclayapollonorth_datad7_lead_finderstoreleadsbuild_withsales_navigator

lead_qualified_wsoptional

enum: pending_boolean

Workspace qualification status

Example: qualified

Allowed values:

qualifiedpendingnot_qualified

POST

curl -X POST "https://web-production-603a8.up.railway.app/contact_push"
  -H "api_key: {{your_api_key}}"
  -H "Content-Type: application/json"
  -d '{
  "workspace_id": "00000000-0000-0000-0000-000000000000",
  "company_id": "00000000-0000-0000-0000-000000000000",
  "contact_email_valid": "max.mueller@example.com",
  "contact_email_catch_all": "info@example.com",
  "contact_email_invalid": "invalid@example.com",
  "contact_email_unsure": "unsure@example.com",
  "contact_linkedin": "https://linkedin.com/in/max-mueller",
  "contact_xing": "https://xing.com/profile/max-mueller",
  "contact_first_name": "Max",
  "contact_first_name_cleaned": "Max",
  "contact_last_name": "Müller",
  "contact_last_name_cleaned": "Mueller",
  "contact_gender": "Male",
  "contact_language": "German",
  "contact_estimated_birth_year": 1990,
  "contact_birth_year": 1990,
  "contact_birth_date": "1990-05-15",
  "person_country": "Germany",
  "person_city": "München",
  "linkedin_cv": {
    "experience": [
      {
        "company": "Example GmbH",
        "position": "Sales Manager",
        "duration": "2020-2023"
      }
    ]
  },
  "linkedin_volunteerings": {
    "organizations": [
      "Non-Profit Organization"
    ]
  },
  "started_education_linkedin": 2010,
  "first_job_start_linkedin": 2015,
  "contact_location": "München, Bayern, Germany",
  "contact_academic_title": "Dr.",
  "person_state": "Bayern",
  "person_native_german": "true",
  "person_scooling_country": "Germany",
  "contact_linkedin_image_url": "https://media.licdn.com/dms/image/example/profile.jpg",
  "person_linkedin_followers": 500,
  "person_linkedin_connections": 300,
  "lead_position": "Sales Manager",
  "lead_position_cleaned": "Sales Manager",
  "lead_seniority": "Manager",
  "lead_departement": "Sales",
  "still_at_company": true,
  "lead_start_date": "2020-01-01",
  "lead_end_date": "2023-12-31",
  "lead_seniority_enum": "manager",
  "lead_departement_enum": "sales",
  "lead_position_clean_plural_dativ": "Vertriebsleitern",
  "lead_position_clean_plural_nominativ": "Vertriebsleiter",
  "lead_summary": "Experienced sales professional with 10+ years in B2B software sales",
  "lead_sources": [
    "apollo",
    "sales_navigator"
  ],
  "lead_qualified_ws": "qualified"
}'

Response Parameters

person_iduuid

UUID of the person record (either found or newly created). null only if person operation failed.

Example: 00000000-0000-0000-0000-000000000000

lead_iduuid

UUID of the lead record (either found or newly created). Lead connects a person to a company with position/role information. null if lead operation not performed or failed. Requires both person_id and company_id to be present.

Example: 00000000-0000-0000-0000-000000000000

people_workspace_iduuid

UUID of the people-workspace connection (either found or newly created). Only populated if workspace_id was provided in request. null if workspace operation not performed or failed.

Example: 00000000-0000-0000-0000-000000000000

lead_workspace_iduuid

UUID of the lead-workspace connection (either found or newly created). Only populated if workspace_id was provided in request. null if workspace operation not performed or failed.

Example: 00000000-0000-0000-0000-000000000000

status_personenum

Status of the person record operation. Can be "found" (person already existed), "created" (new person created), or null (operation failed).

Example: created

Possible values:

foundcreated

status_leadenum

Status of the lead record operation. Can be "found" (lead already existed), "created" (new lead created), or null (not performed or failed). Only created if both person_id and company_id exist.

Example: created

Possible values:

foundcreated

status_people_workspaceenum

Status of the people-workspace connection operation. Can be "found" (workspace connection already existed), "created" (new workspace connection created), or null (not performed or failed). Only relevant when workspace_id provided in request.

Example: created

Possible values:

foundcreated

status_lead_workspaceenum

Status of the lead-workspace connection operation. Can be "found" (workspace connection already existed), "created" (new workspace connection created), or null (not performed or failed). Only relevant when workspace_id provided in request.

Example: created

Possible values:

foundcreated

errorstring

Error message if any operation failed. null if all operations succeeded.

Response

{
  "person_id": "person-uuid-456",
  "lead_id": "lead-uuid-123",
  "people_workspace_id": "people-ws-uuid-012",
  "lead_workspace_id": "lead-ws-uuid-789",
  "status_person": "created",
  "status_lead": "created",
  "status_people_workspace": "created",
  "status_lead_workspace": "created",
  "error": null
}

POST/contact_push_patch

Update existing contact data in database.

Required: At least one of the following conditions must be met

Request Body

workspace_idclient_data

uuid

The workspace identifier

Example: 00000000-0000-0000-0000-000000000000

company_idat least 1 required

uuid

Company identifier

Example: 00000000-0000-0000-0000-000000000000

contact_email_validat least 1 required

string

Valid email address

Example: max.mueller@example.com

contact_email_catch_allat least 1 required

string

Catch-all email address

Example: info@example.com

contact_email_invalidat least 1 required

string

Invalid email address

Example: invalid@example.com

contact_email_unsureat least 1 required

string

Unsure email address

Example: unsure@example.com

contact_linkedinat least 1 required

string

Contact LinkedIn URL

Example: https://linkedin.com/in/max-mueller

contact_xingat least 1 required

string

Contact XING URL

Example: https://xing.com/profile/max-mueller

contact_first_nameat least 1 required

string

Contact first name

Example: Max

contact_first_name_cleanedat least 1 required

string

Cleaned contact first name

Example: Max

contact_last_nameat least 1 required

string

Contact last name

Example: Müller

contact_last_name_cleanedat least 1 required

string

Cleaned contact last name

Example: Mueller

contact_genderoptional

string

Contact gender

Example: Male

contact_languageoptional

string

Contact language

Example: German

contact_estimated_birth_yearoptional

int2

Estimated birth year

Example: 1990

contact_birth_yearoptional

int2

Birth year

Example: 1990

contact_birth_dateoptional

string

Birth date

Example: 1990-05-15

person_countryoptional

string

Person country

Example: Germany

person_cityoptional

string

Person city

Example: München

linkedin_cvoptional

jsonb

LinkedIn CV data

Example: {"experience":[{"company":"Example GmbH","position":"Sales Manager","duration":"2020-2023"}]}

linkedin_volunteeringsoptional

jsonb

LinkedIn volunteering activities

Example: {"organizations":["Non-Profit Organization"]}

started_education_linkedinoptional

int8

Education start date from LinkedIn

Example: 2010

first_job_start_linkedinoptional

int8

First job start date from LinkedIn

Example: 2015

contact_locationoptional

string

Contact location

Example: München, Bayern, Germany

contact_academic_titleoptional

string

Academic title

Example: Dr.

person_stateoptional

string

Person state/region

Example: Bayern

person_native_germanoptional

string

Native German speaker indicator

Example: true

person_scooling_countryoptional

string

Schooling country

Example: Germany

contact_linkedin_image_urloptional

string

LinkedIn profile image URL

Example: https://media.licdn.com/dms/image/example/profile.jpg

person_linkedin_followersoptional

int8

Number of LinkedIn followers

Example: 500

person_linkedin_connectionsoptional

int8

Number of LinkedIn connections

Example: 300

lead_positionoptional

string

Job position

Example: Sales Manager

lead_position_cleanedoptional

string

Cleaned job position

Example: Sales Manager

lead_seniorityoptional

string

Seniority level

Example: Manager

lead_departementoptional

string

Department

Example: Sales

still_at_companyoptional

bool

Still employed at company

Example: true

Possible values:truefalse

lead_start_dateoptional

date

Position start date

Example: 2020-01-01

lead_end_dateoptional

date

Position end date

Example: 2023-12-31

lead_position_clean_plural_dativoptional

string

Position in plural dativ form

Example: Vertriebsleitern

lead_position_clean_plural_nominativoptional

string

Position in plural nominativ form

Example: Vertriebsleiter

lead_summaryoptional

string

Lead summary

Example: Experienced sales professional with 10+ years in B2B software sales

lead_sourcesoptional

enum: data_sourcesarray

Lead sources

Example: ["apollo","sales_navigator"]

Allowed values:

lushaclayapollonorth_datad7_lead_finderstoreleadsbuild_withsales_navigator

lead_qualified_wsoptional

enum: pending_boolean

Workspace qualification status

Example: qualified

Allowed values:

qualifiedpendingnot_qualified

POST

curl -X POST "https://web-production-603a8.up.railway.app/contact_push_patch"
  -H "api_key: {{your_api_key}}"
  -H "Content-Type: application/json"
  -d '{
  "workspace_id": "00000000-0000-0000-0000-000000000000",
  "company_id": "00000000-0000-0000-0000-000000000000",
  "contact_email_valid": "max.mueller@example.com",
  "contact_email_catch_all": "info@example.com",
  "contact_email_invalid": "invalid@example.com",
  "contact_email_unsure": "unsure@example.com",
  "contact_linkedin": "https://linkedin.com/in/max-mueller",
  "contact_xing": "https://xing.com/profile/max-mueller",
  "contact_first_name": "Max",
  "contact_first_name_cleaned": "Max",
  "contact_last_name": "Müller",
  "contact_last_name_cleaned": "Mueller",
  "contact_gender": "Male",
  "contact_language": "German",
  "contact_estimated_birth_year": 1990,
  "contact_birth_year": 1990,
  "contact_birth_date": "1990-05-15",
  "person_country": "Germany",
  "person_city": "München",
  "linkedin_cv": {
    "experience": [
      {
        "company": "Example GmbH",
        "position": "Sales Manager",
        "duration": "2020-2023"
      }
    ]
  },
  "linkedin_volunteerings": {
    "organizations": [
      "Non-Profit Organization"
    ]
  },
  "started_education_linkedin": 2010,
  "first_job_start_linkedin": 2015,
  "contact_location": "München, Bayern, Germany",
  "contact_academic_title": "Dr.",
  "person_state": "Bayern",
  "person_native_german": "true",
  "person_scooling_country": "Germany",
  "contact_linkedin_image_url": "https://media.licdn.com/dms/image/example/profile.jpg",
  "person_linkedin_followers": 500,
  "person_linkedin_connections": 300,
  "lead_position": "Sales Manager",
  "lead_position_cleaned": "Sales Manager",
  "lead_seniority": "Manager",
  "lead_departement": "Sales",
  "still_at_company": true,
  "lead_start_date": "2020-01-01",
  "lead_end_date": "2023-12-31",
  "lead_position_clean_plural_dativ": "Vertriebsleitern",
  "lead_position_clean_plural_nominativ": "Vertriebsleiter",
  "lead_summary": "Experienced sales professional with 10+ years in B2B software sales",
  "lead_sources": [
    "apollo",
    "sales_navigator"
  ],
  "lead_qualified_ws": "qualified"
}'

Response Parameters

person_iduuid

UUID of the person record (found, created, or updated). null only if person operation failed.

Example: 00000000-0000-0000-0000-000000000000

lead_iduuid

UUID of the lead record (found, created, or updated). Lead connects a person to a company with position/role information. null if lead operation not performed or failed. Requires both person_id and company_id to be present.

Example: 00000000-0000-0000-0000-000000000000

people_workspace_iduuid

UUID of the people-workspace connection (found, created, or updated). Only populated if workspace_id was provided in request. null if workspace operation not performed or failed.

Example: 00000000-0000-0000-0000-000000000000

lead_workspace_iduuid

UUID of the lead-workspace connection (found, created, or updated). Only populated if workspace_id was provided in request. null if workspace operation not performed or failed.

Example: 00000000-0000-0000-0000-000000000000

status_personenum

Status of the person record operation. Can be "created" (new person created), "updated" (existing person updated), or null (operation failed). Note: Unlike /contact_push, this endpoint never returns "found" - it always updates if found.

Example: updated

Possible values:

createdupdated

status_leadenum

Status of the lead record operation. Can be "created" (new lead created), "updated" (existing lead updated), or null (not performed or failed). Note: Always updates if lead exists.

Example: updated

Possible values:

createdupdated

status_people_workspaceenum

Status of the people-workspace connection operation. Can be "created" (new workspace connection created), "updated" (existing workspace connection updated), or null (not performed or failed). Only relevant when workspace_id provided in request. Note: Always updates if workspace connection exists.

Example: updated

Possible values:

createdupdated

status_lead_workspaceenum

Status of the lead-workspace connection operation. Can be "created" (new workspace connection created), "updated" (existing workspace connection updated), or null (not performed or failed). Only relevant when workspace_id provided in request. Note: Always updates if workspace connection exists.

Example: updated

Possible values:

createdupdated

errorstring

Error message if any operation failed. null if all operations succeeded.

Response

{
  "person_id": "person-uuid-456",
  "lead_id": "lead-uuid-123",
  "people_workspace_id": "people-ws-uuid-012",
  "lead_workspace_id": "lead-ws-uuid-789",
  "status_person": "updated",
  "status_lead": "updated",
  "status_people_workspace": "updated",
  "status_lead_workspace": "updated",
  "error": null
}

POST/contact_delete_fields

Create a new peopl record.

Required: The following fields are mandatory

people_idANDlead_id

Request Body

people_idrequired

uuid

The unique identifier of the person

Example: 00000000-0000-0000-0000-000000000000

lead_idrequired

uuid

The unique identifier of the lead

Example: 00000000-0000-0000-0000-000000000000

people_workspace_idclient_data

uuid

The workspace identifier for people

Example: 00000000-0000-0000-0000-000000000000

leads_workspace_idclient_data

uuid

The workspace identifier for leads

Example: 00000000-0000-0000-0000-000000000000

contact_linkedinoptional

string

The exact LinkedIn URL value to delete from the database

Example: https://linkedin.com/in/john-doe

contact_xingoptional

string

The exact XING URL value to delete from the database

Example: https://xing.com/profile/john-doe

contact_email_validoptional

string

The exact valid email address to delete from the database

Example: john.doe@example.com

contact_email_catch_alloptional

string

The exact catch-all email address to delete from the database

Example: contact@example.com

contact_email_invalidoptional

string

The exact invalid email address to delete from the database

Example: invalid@example.com

contact_email_unsureoptional

string

The exact unsure email address to delete from the database

Example: unsure@example.com

contact_first_name_cleanedoptional

boolean

Set to true to delete the cleaned first name field, false to keep it

Example: false

Possible values:truefalse

contact_last_name_cleanedoptional

boolean

Set to true to delete the cleaned last name field, false to keep it

Example: false

Possible values:truefalse

person_genderoptional

boolean

Set to true to delete the gender field, false to keep it

Example: false

Possible values:truefalse

person_languageoptional

boolean

Set to true to delete the language field, false to keep it

Example: false

Possible values:truefalse

contact_estimated_birth_yearoptional

boolean

Set to true to delete the estimated birth year field, false to keep it

Example: false

Possible values:truefalse

contact_birth_yearoptional

boolean

Set to true to delete the birth year field, false to keep it

Example: false

Possible values:truefalse

contact_birth_dateoptional

boolean

Set to true to delete the birth date field, false to keep it

Example: false

Possible values:truefalse

person_countryoptional

boolean

Set to true to delete the country field, false to keep it

Example: false

Possible values:truefalse

person_cityoptional

boolean

Set to true to delete the city field, false to keep it

Example: false

Possible values:truefalse

linkedin_cvoptional

boolean

Set to true to delete the LinkedIn CV field, false to keep it

Example: false

Possible values:truefalse

linkedin_volunteeringsoptional

boolean

Set to true to delete the LinkedIn volunteerings field, false to keep it

Example: false

Possible values:truefalse

started_education_linkedinoptional

boolean

Set to true to delete the education start date field, false to keep it

Example: false

Possible values:truefalse

first_job_start_linkedinoptional

boolean

Set to true to delete the first job start date field, false to keep it

Example: false

Possible values:truefalse

contact_locationoptional

boolean

Set to true to delete the location field, false to keep it

Example: false

Possible values:truefalse

contact_academic_titleoptional

boolean

Set to true to delete the academic title field, false to keep it

Example: false

Possible values:truefalse

person_stateoptional

boolean

Set to true to delete the state field, false to keep it

Example: false

Possible values:truefalse

person_native_germanoptional

boolean

Set to true to delete the native German field, false to keep it

Example: false

Possible values:truefalse

person_scooling_countryoptional

boolean

Set to true to delete the schooling country field, false to keep it

Example: false

Possible values:truefalse

contact_linkedin_image_urloptional

boolean

Set to true to delete the LinkedIn image URL field, false to keep it

Example: false

Possible values:truefalse

person_linkedin_followersoptional

boolean

Set to true to delete the LinkedIn followers field, false to keep it

Example: false

Possible values:truefalse

person_linkedin_connectionsoptional

boolean

Set to true to delete the LinkedIn connections field, false to keep it

Example: false

Possible values:truefalse

lead_positionoptional

boolean

Set to true to delete the position field, false to keep it

Example: false

Possible values:truefalse

lead_position_cleanedoptional

boolean

Set to true to delete the cleaned position field, false to keep it

Example: false

Possible values:truefalse

lead_seniorityoptional

boolean

Set to true to delete the seniority field, false to keep it

Example: false

Possible values:truefalse

lead_departementoptional

boolean

Set to true to delete the department field, false to keep it

Example: false

Possible values:truefalse

still_at_companyoptional

boolean

Set to true to delete the still at company field, false to keep it

Example: false

Possible values:truefalse

lead_start_dateoptional

boolean

Set to true to delete the start date field, false to keep it

Example: false

Possible values:truefalse

lead_end_dateoptional

boolean

Set to true to delete the end date field, false to keep it

Example: false

Possible values:truefalse

lead_seniority_enumoptional

boolean

Set to true to delete the seniority enum field, false to keep it

Example: false

Possible values:truefalse

lead_departement_enumoptional

boolean

Set to true to delete the department enum field, false to keep it

Example: false

Possible values:truefalse

lead_position_clean_plural_dativoptional

boolean

Set to true to delete the position plural dativ field, false to keep it

Example: false

Possible values:truefalse

lead_position_clean_plural_nominativoptional

boolean

Set to true to delete the position plural nominativ field, false to keep it

Example: false

Possible values:truefalse

lead_summaryoptional

boolean

Set to true to delete the lead summary field, false to keep it

Example: false

Possible values:truefalse

lead_sourcesoptional

enum: data_sourcesarray

Array of lead source values to delete from the database array. All inputted values in the array will be removed from the table array field

Example: ["apollo","sales_navigator"]

Allowed values:

lushaclayapollonorth_datad7_lead_finderstoreleadsbuild_withsales_navigator

lead_qualified_wsoptional

boolean

Set to true to delete the workspace qualification field, false to keep it

Example: false

Possible values:truefalse

POST

curl -X POST "https://web-production-603a8.up.railway.app/contact_delete_fields"
  -H "api_key: {{your_api_key}}"
  -H "Content-Type: application/json"
  -d '{
  "people_id": "00000000-0000-0000-0000-000000000000",
  "lead_id": "00000000-0000-0000-0000-000000000000",
  "people_workspace_id": "00000000-0000-0000-0000-000000000000",
  "leads_workspace_id": "00000000-0000-0000-0000-000000000000",
  "contact_linkedin": "https://linkedin.com/in/john-doe",
  "contact_xing": "https://xing.com/profile/john-doe",
  "contact_email_valid": "john.doe@example.com",
  "contact_email_catch_all": "contact@example.com",
  "contact_email_invalid": "invalid@example.com",
  "contact_email_unsure": "unsure@example.com",
  "contact_first_name_cleaned": false,
  "contact_last_name_cleaned": false,
  "person_gender": false,
  "person_language": false,
  "contact_estimated_birth_year": false,
  "contact_birth_year": false,
  "contact_birth_date": false,
  "person_country": false,
  "person_city": false,
  "linkedin_cv": false,
  "linkedin_volunteerings": false,
  "started_education_linkedin": false,
  "first_job_start_linkedin": false,
  "contact_location": false,
  "contact_academic_title": false,
  "person_state": false,
  "person_native_german": false,
  "person_scooling_country": false,
  "contact_linkedin_image_url": false,
  "person_linkedin_followers": false,
  "person_linkedin_connections": false,
  "lead_position": false,
  "lead_position_cleaned": false,
  "lead_seniority": false,
  "lead_departement": false,
  "still_at_company": false,
  "lead_start_date": false,
  "lead_end_date": false,
  "lead_seniority_enum": false,
  "lead_departement_enum": false,
  "lead_position_clean_plural_dativ": false,
  "lead_position_clean_plural_nominativ": false,
  "lead_summary": false,
  "lead_sources": [
    "apollo",
    "sales_navigator"
  ],
  "lead_qualified_ws": false
}'

Response Parameters

successboolean

Indicates whether the operation completed successfully. true = operation completed (even if no fields were deleted), false = operation failed due to error.

Example: true

messagestring

Detailed message describing what operations were performed. Success format: "Successfully completed: <list of operations>". Message may contain People Operations: "People: Set N fields to NULL" (boolean fields set to NULL in db_people), "People: Deleted N identifier records" (identifier records deleted from db_people_identifiers for LinkedIn/Xing). Leads Operations: "Leads: Set N fields to NULL" (boolean fields set to NULL in db_leads), "Leads: Removed N items from lead_sources" (sources removed from lead_sources array), "Leads: Deleted N email identifier records" (email identifier records deleted from db_leads_identifiers). Workspace Operations: "People Workspace: No fields to update (table only contains IDs)" (people workspace table has no deletable fields), "Leads Workspace: Set lead_qualified_ws to NULL" (workspace qualification field set to NULL). No operations format: "No operations performed (no fields specified for deletion)". Error format: "Database error: <error details>" or "Error: <error details>".

Example:

Successfully completed: People: Set 2 fields to NULL; Leads: Set 3 fields to NULL; Leads: Removed 1 items from lead_sources; People: Deleted 1 identifier records; Leads: Deleted 2 email identifier records; Leads Workspace: Set lead_qualified_ws to NULL

Response

{
  "success": true,
  "message": "Successfully completed: People: Set 2 fields to NULL; Leads: Set 3 fields to NULL; Leads: Removed 1 items from lead_sources; People: Deleted 1 identifier records; Leads: Deleted 2 email identifier records; Leads Workspace: Set lead_qualified_ws to NULL"
}

Field Cleaning

Overview

Field cleaning is a critical data normalization process that ensures consistency, improves matching accuracy, and prevents duplicate records in the database. All incoming data is cleaned before database operations (lookup, push, push_patch).

When Field Cleaning is Applied

•Before company lookup operations
•Before contact/lead lookup operations
•Before pushing new company data
•Before pushing new contact/lead data
•Before updating existing records (push_patch operations)

Purpose

•Standardize data formats for accurate matching
•Remove inconsistencies and variations
•Enable reliable deduplication
•Improve data quality

Pre-Processing Stage (Applied to ALL Fields)

Before any field-specific cleaning, ALL string fields undergo standardization:

1. Whitespace Stripping

Purpose: Remove leading/trailing spaces that cause matching failures

Input:

" example.com "

Output:

"example.com"

2. Quote Normalization (normalize_quotes)

Purpose: Convert all Unicode quote characters to standard ASCII quotes

Characters Replaced:

• ' (U+2019 - RIGHT SINGLE QUOTATION MARK) → '
• ' (U+2018 - LEFT SINGLE QUOTATION MARK) → '
• ` (U+0060 - GRAVE ACCENT/BACKTICK) → '
• ´ (U+00B4 - ACUTE ACCENT) → '
• " (U+201C - LEFT DOUBLE QUOTATION MARK) → "
• " (U+201D - RIGHT DOUBLE QUOTATION MARK) → "

Input (smart quotes):

"O'Brien's Company"

Output (regular apostrophes):

"O'Brien's Company"

Why This Matters:

• Smart quotes come from copy-paste from Word, PDFs, websites
• Database comparisons fail when quotes don't match
• Enables consistent matching across data sources

3. Empty Field Removal

Purpose: Remove fields that are null, empty string, or whitespace-only

Removal Criteria:

• None → Removed
• "" → Removed
• " " → Removed (becomes "" after strip)

Impact:

• Reduces payload size
• Prevents NULL constraint violations
• Improves database performance
• Fields not removed: 0, False, [] (valid data)

Order of Operations

1Strip whitespace from all string fields
2Normalize quotes in all string fields
3Convert empty/whitespace fields to None
4Apply field-specific cleaning (domain, LinkedIn, etc.)
5Remove all None/empty fields from payload

Company Field Cleaning

The clean_company_fields(data: dict) → dict function processes company data through multiple stages.

1. Domain Cleaning (clean_domain)

Purpose: Normalize website URLs to consistent domain format for reliable matching

Detailed Algorithm:

1. Strip Whitespace: Remove leading/trailing spaces
2. Protocol Removal: Remove prefixes in order of priority:
- - https://www.
- - http://www.
- - https://
- - http://
- - www.
3. Path Removal: Split by / and take only first part (domain)
4. Query Parameter Removal: Split by ? and take only first part
5. Null on Empty: If result is empty string → NULL

Examples:

✓ Valid Examples:

Input:

"https://www.example.com/about-us?ref=home"

Output:

"example.com"

Input:

"HTTP://WWW.COMPANY.DE/"

Output:

"company.de"

Input:

"subdomain.example.com/products"

Output:

"subdomain.example.com"

Input:

"www.test.org"

Output:

"test.org"

Input:

"https://api.service.com/v1/endpoint"

Output:

"api.service.com"

✗ Invalid (becomes NULL):

Input:

"" (empty string)

Output:

NULL

Input:

" " (whitespace only)

Output:

NULL

Input:

"https://" (no domain after protocol)

Output:

NULL

Edge Cases:

Handling:

• Subdomains: Preserved intact (e.g., blog.company.com)
• Fragments: Removed (e.g., example.com#section → example.com)
• Multiple Slashes: Only first part kept
• Port Numbers: Preserved (e.g., localhost:8080)
• International Domains: Preserved as-is

Validation Logic:

• Does NOT validate actual domain format
• Does NOT check TLD validity (.com, .de)
• Does NOT perform DNS lookups
• Simply extracts and normalizes domain portion
• Allows localhost, 127.0.0.1

Error Handling:

• No try-except block needed (string operations only)
• Empty results after cleaning → NULL

Fields Cleaned:

• company_domain
• All domain identifiers in company_domains array

Impact on Matching:

Without Cleaning:

"https://www.example.com" ≠ "example.com" → Creates duplicate

With Cleaning:

"https://www.example.com" = "example.com" → Prevents duplicate

2. LinkedIn URL Cleaning (clean_linkedin_url)

Purpose: Validate and standardize LinkedIn company page URLs

Detailed Algorithm:

1. Validation Check: URL must contain linkedin.com/company/
2. URL Splitting: Split URL by linkedin.com/company/
3. Slug Extraction:
- - Take everything after linkedin.com/company/
- - Remove trailing paths (split by /, take first part)
- - Remove query parameters (split by ?, take first part)
4. Length Validation: Slug must be at least 2 characters long
5. Reconstruction: Build URL as https://www.linkedin.com/company/{slug}
6. Null on Failure: Set to NULL if any validation fails

Examples:

✓ Valid:

Input:

"https://www.linkedin.com/company/microsoft/"

Output:

"https://www.linkedin.com/company/microsoft"

Input:

"linkedin.com/company/google/about/"

Output:

"https://www.linkedin.com/company/google"

Input:

"https://de.linkedin.com/company/bmw-group?trk=public"

Output:

"https://www.linkedin.com/company/bmw-group"

Input:

"http://www.linkedin.com/company/apple"

Output:

"https://www.linkedin.com/company/apple"

✗ Invalid (becomes NULL):

Input:

"linkedin.com/school/stanford-university"

Reason: NOT /company/ URL

NULL

Input:

"https://www.linkedin.com/company/a"

Reason: Slug too short (1 char)

NULL

Input:

"linkedin.com/company/"

Reason: No slug

NULL

Input:

"https://www.linkedin.com/in/person-name"

Reason: Personal profile, not company

NULL

Input:

"https://facebook.com/company"

Reason: Not LinkedIn

NULL

IMPORTANT VALIDATION RULES:

• ONLY accepts /company/ URLs
• REJECTS /school/ URLs → NULL (despite earlier documentation suggesting otherwise)
• REJECTS personal profiles (/in/) → NULL
• REJECTS showcase pages (/showcase/) → NULL

Edge Cases:

• Locale Prefixes: Removed automatically (e.g., de.linkedin.com → www.linkedin.com)
• Mobile URLs: Handled (e.g., m.linkedin.com → www.linkedin.com)
• Query Parameters: All removed (e.g., ?trk=public, ?original_referer=)
• Trailing Slashes: Removed from slug
• Sub-pages: Removed (e.g., /about, /people, /jobs)

Slug Validation:

• Minimum length: 2 characters
• Can contain: letters, numbers, hyphens, underscores
• No validation of actual company existence on LinkedIn
• No case transformation (preserves original case)

Error Handling:

• Try-except block catches malformed URLs
• Any exception during processing → NULL
• Missing parts after split → NULL
• Empty slug after extraction → NULL

Common Rejection Scenarios:

Input Type	Example	Result	Reason
School page	`linkedin.com/school/stanford`	NULL	Not /company/
Personal profile	`linkedin.com/in/john-doe`	NULL	Not /company/
Showcase page	`linkedin.com/showcase/product`	NULL	Not /company/
Short slug	`linkedin.com/company/a`	NULL	Slug < 2 chars
No slug	`linkedin.com/company/`	NULL	Empty slug
Wrong platform	`xing.com/companies/test`	NULL	Not LinkedIn

Fields Cleaned:

• company_linkedin
• All LinkedIn identifiers in company_linkedins array

Impact on Matching:

Without Cleaning:

"https://de.linkedin.com/company/bmw?trk=public" ≠ "linkedin.com/company/bmw" → Creates duplicate

With Cleaning:

"https://de.linkedin.com/company/bmw?trk=public" = "https://www.linkedin.com/company/bmw" → Prevents duplicate

3. Email Cleaning

⚠️ NOTE: Email cleaning is NOT implemented in the current codebase.

Emails are handled through the pre-processing stage only (whitespace stripping and quote normalization).

Current Behavior:

• Whitespace is stripped (pre-processing)
• Quotes are normalized (pre-processing)
• No case transformation
• No validation
• Field passes through as-is after pre-processing

Actual vs Expected Behavior:

Actual (Current):

Input:

"Info@Company.COM"

Output:

"Info@Company.COM"

(no transformation)

Expected (Not Implemented):

Input:

"Info@Company.COM"

Expected Output:

"info@company.com"

(lowercase)

Fields Affected:

• company_email
• All email identifiers

To Implement:

• Lowercase conversion
• @ symbol validation
• Email format validation

4. Phone Number Cleaning

⚠️ NOTE: Phone number cleaning is NOT implemented in the current codebase.

Phone numbers are handled through the pre-processing stage only (whitespace stripping and quote normalization).

Current Behavior:

• Whitespace is stripped (pre-processing)
• Quotes are normalized (pre-processing)
• No format transformation
• No validation
• Field passes through as-is after pre-processing

Actual vs Expected Behavior:

Actual (Current):

Input:

"+49 (30) 1234-5678"

Output:

"+49 (30) 1234-5678"

(no transformation)

Expected (Not Implemented):

Input:

"+49 (30) 1234-5678"

Expected Output:

"+493012345678"

(formatted)

Fields Affected:

• company_phone
• All phone identifiers

To Implement:

• Remove non-digit characters (except +)
• Normalize to E.164 format
• Add + prefix for international numbers
• Validate phone number format

5. Social Media URL Cleaning

Multiple functions clean different social media platforms to consistent formats. Each platform has strict validation rules and will set the field to NULL if validation fails.

Platform Output Formats & Special Notes:

Platform	Required Pattern	Output Format	Special Notes
Instagram	`instagram.com`	`https://www.instagram.com/{slug}`	Accepts any Instagram URL
Facebook	`facebook.com`	`https://www.facebook.com/{slug}`	Accepts any Facebook URL
Xing	`xing.com/pages/`	`https://www.xing.com/pages/{slug}`	ONLY /pages/ URLs
Pinterest	`pinterest.com`	`https://de.pinterest.com/{slug}`	Always German locale
TikTok	`tiktok.com/@`	`https://www.tiktok.com/@{slug}`	Requires @ symbol
YouTube	`youtube.com`	`https://www.youtube.com/{slug}`	Preserves path format (/c/, /channel/, /@, /user/)
Twitter/X	`x.com`	`https://x.com/{slug}`	ONLY x.com, NOT twitter.com

Common Algorithm (ALL Platforms):

1. Remove protocol (http://, https://)
2. Remove www. prefix
3. Remove query parameters and fragments
4. Remove trailing slashes
5. Convert to lowercase (except for case-sensitive platforms)
6. Keep platform-specific path structure

Instagram

Input:

"https://www.instagram.com/company/?hl=en"

Output:

"https://www.instagram.com/company"

Facebook

Input:

"https://www.facebook.com/Page/"

Output:

"https://www.facebook.com/Page"

Xing

Input:

"https://www.xing.com/pages/name"

Output:

"https://www.xing.com/pages/name"

⚠️ /companies/ → NULL

Input:

"https://www.pinterest.com/boards/"

Output:

"https://de.pinterest.com/boards"

🌍 Always German locale

TikTok

Input:

"https://www.tiktok.com/@name?lang=en"

Output:

"https://www.tiktok.com/@name"

⚠️ No @ → NULL

YouTube

Input:

"https://www.youtube.com/c/Channel"

Output:

"https://www.youtube.com/c/Channel"

Twitter/X

Input:

"https://twitter.com/Handle?ref_src=twsrc"

Output:

NULL

⚠️ Must use x.com

Error Recovery:

• No partial saves - invalid URLs become NULL
• No fallback attempts - strict validation
• No logging of failed URLs - silent NULL assignment
• Fields with NULL are removed from payload before database insertion

Detailed Platform Algorithms

Each platform has its own specialized cleaning function with unique validation rules.

Pinterest (`clean_pinterest`)

Purpose: Validate and normalize Pinterest profile URLs

Detailed Algorithm:

1. Validation Check: URL must contain pinterest.com
2. URL Splitting: Split URL by pinterest.com/
3. Slug Extraction: Take everything after pinterest.com/, remove trailing paths (split by /, take first part), remove query parameters (split by ?, take first part)
4. Length Validation: Slug must be at least 2 characters long
5. Reconstruction: Build URL as https://de.pinterest.com/{slug}
6. Null on Failure: Set to NULL if any validation fails

Examples:

✓ Valid:

Input:

"https://www.pinterest.com/company_boards/"

Output:

"https://de.pinterest.com/company_boards"

Input:

"pinterest.com/nike/ideas?source=web"

Output:

"https://de.pinterest.com/nike"

Input:

"https://de.pinterest.com/cocacola"

Output:

"https://de.pinterest.com/cocacola"

✗ Invalid (becomes NULL):

Input:

"https://www.pinterest.com/p"

Reason:

Slug too short: 1 char

Output:

NULL

Input:

"pinterest.com/"

Reason:

No slug

Output:

NULL

Special Note:

• Output ALWAYS uses de.pinterest.com (German locale)
• Input can be from any Pinterest locale (www, de, fr, etc.)
• This standardizes to German locale for consistency

Error Handling:

• Try-except block catches malformed URLs
• Any exception during processing → NULL

Fields Cleaned:

• company_pinterest

TikTok (`clean_tiktok`)

Purpose: Validate and normalize TikTok profile URLs

Detailed Algorithm:

1. Validation Check: URL must contain tiktok.com/@
2. URL Splitting: Split URL by tiktok.com/
3. Slug Extraction: Take everything after tiktok.com/, remove trailing paths (split by /, take first part), remove query parameters (split by ?, take first part)
4. Length Validation: Slug must be at least 2 characters long (includes @)
5. Reconstruction: Build URL as https://www.tiktok.com/{slug}
6. Null on Failure: Set to NULL if any validation fails

Examples:

✓ Valid:

Input:

"https://www.tiktok.com/@companyname?lang=en"

Output:

"https://www.tiktok.com/@companyname"

Input:

"tiktok.com/@nike/video/12345"

Output:

"https://www.tiktok.com/@nike"

Input:

"https://www.tiktok.com/@cocacola"

Output:

"https://www.tiktok.com/@cocacola"

✗ Invalid (becomes NULL):

Input:

"https://www.tiktok.com/companyname"

Reason:

No @ symbol

Output:

NULL

Input:

"https://www.tiktok.com/@a"

Reason:

Slug too short: 2 chars total including @

Output:

NULL

Input:

"tiktok.com/@"

Reason:

No username after @

Output:

NULL

Special Requirements:

• URL MUST contain tiktok.com/@
• The @ symbol is required and preserved in the slug
• Without @ symbol, URL is considered invalid → NULL

Error Handling:

• Try-except block catches malformed URLs
• Any exception during processing → NULL

Fields Cleaned:

• company_tiktok

YouTube (`clean_youtube`)

Purpose: Validate and normalize YouTube channel URLs

Detailed Algorithm:

1. Validation Check: URL must contain youtube.com
2. URL Splitting: Split URL by youtube.com/
3. Slug Extraction: Take everything after youtube.com/, remove trailing paths (split by /, take first part), remove query parameters (split by ?, take first part)
4. Length Validation: Slug must be at least 2 characters long
5. Reconstruction: Build URL as https://www.youtube.com/{slug}
6. Null on Failure: Set to NULL if any validation fails

Examples:

✓ Valid:

Input:

"https://www.youtube.com/c/CompanyChannel"

Output:

"https://www.youtube.com/c/CompanyChannel"

Input:

"youtube.com/channel/UCxxxxxx/videos"

Output:

"https://www.youtube.com/channel/UCxxxxxx"

Input:

"https://m.youtube.com/@CompanyName?feature=share"

Output:

"https://www.youtube.com/@CompanyName"

Input:

"https://www.youtube.com/user/OldUsername"

Output:

"https://www.youtube.com/user/OldUsername"

✗ Invalid (becomes NULL):

Input:

"https://www.youtube.com/c"

Reason:

Slug too short: 1 char

Output:

NULL

Input:

"youtube.com/"

Reason:

No slug

Output:

NULL

Input:

"https://vimeo.com/channel"

Reason:

No youtube.com

Output:

NULL

Supported YouTube URL Formats:

• /c/{channel-name} (custom channel URL)
• /channel/{channel-id} (channel ID)
• /@{handle} (new YouTube handle format)
• /user/{username} (legacy username)

Error Handling:

• Try-except block catches malformed URLs
• Any exception during processing → NULL

Fields Cleaned:

• company_youtube

Twitter/X (`clean_twitter`)

Purpose: Validate and normalize Twitter/X profile URLs

Detailed Algorithm:

1. Validation Check: URL must contain x.com (NEW Twitter branding)
2. URL Splitting: Split URL by x.com/
3. Slug Extraction: Take everything after x.com/, remove trailing paths (split by /, take first part), remove query parameters (split by ?, take first part)
4. Length Validation: Slug must be at least 2 characters long
5. Reconstruction: Build URL as https://x.com/{slug}
6. Null on Failure: Set to NULL if any validation fails

Examples:

✓ Valid:

Input:

"https://x.com/CompanyHandle?ref_src=twsrc"

Output:

"https://x.com/CompanyHandle"

Input:

"x.com/nike/status/12345"

Output:

"https://x.com/nike"

Input:

"https://www.x.com/cocacola"

Output:

"https://x.com/cocacola"

✗ Invalid (becomes NULL):

Input:

"https://twitter.com/CompanyHandle"

Reason:

Uses old twitter.com domain

Output:

NULL

Input:

"https://x.com/a"

Reason:

Slug too short: 1 char

Output:

NULL

Input:

"x.com/"

Reason:

No handle

Output:

NULL

IMPORTANT NOTES:

• ONLY accepts x.com (new Twitter branding)
• REJECTS twitter.com URLs → NULL
• This is a strict migration to X branding
• Old twitter.com URLs will be marked as invalid

Migration Impact:

• Existing twitter.com URLs in database will be marked as NULL during cleaning
• Users must provide x.com URLs for validation to pass
• This enforces the Twitter → X rebranding

Error Handling:

• Try-except block catches malformed URLs
• Any exception during processing → NULL

Fields Cleaned:

• company_twitter

6. Company Name Cleaning

⚠️ NOTE: Company name cleaning is NOT implemented in the current codebase.

Company names are handled through the pre-processing stage only (whitespace stripping and quote normalization).

Current Behavior:

• Whitespace is stripped (pre-processing)
• Quotes are normalized (pre-processing)
• No legal form removal
• No case transformation
• Field passes through as-is after pre-processing

Actual vs Expected Behavior:

Actual (Current):

Input:

" Company Name GmbH "

Output:

"Company Name GmbH"

(whitespace stripped only)

Expected (Not Implemented):

Input:

" Company Name GmbH "

Expected Output:

"Company Name"

(legal form removed)

Fields Affected:

• company_name_cleaned
• All name identifiers

Validation:

None currently implemented

To Implement:

• Remove legal form suffixes (GmbH, AG, Inc., Ltd., etc.)
• Normalize spacing
• Title case conversion
• Handle compound legal forms

7-11. Other Company Fields

⚠️ NOTE: The following cleaning functions (7-11) are NOT implemented in the current codebase.

All these fields are handled through the pre-processing stage only (whitespace stripping and quote normalization).

Fields Affected:

• company_legal_form - No standardization or mapping
• company_street - No normalization
• company_city - No title case conversion
• company_zip - No format normalization
• company_region - No transformation
• company_country - No ISO code conversion

• company_tags - No lowercase or deduplication
• company_sources - No lowercase or deduplication
• company_employees_research - No integer conversion
• company_employees_linkedin - No integer conversion
• company_founded_year - No integer conversion
• company_linkedin_followers - No integer conversion

Current Behavior:

• ALL these fields pass through with only whitespace stripping and quote normalization
• No validation
• No format transformation
• No data normalization beyond pre-processing

Examples of Current Behavior:

Legal Form

Input:

"gmbh"

Current Output:

"gmbh"

(no transformation)

Expected:

"GMBH"

(not implemented)

City

Input:

" münchen "

Current Output:

"münchen"

(whitespace stripped)

Expected:

"München"

(not implemented)

ZIP Code

Input:

"10 115"

Current Output:

"10 115"

(no transformation)

Expected:

"10115"

(not implemented)

Contact Field Cleaning

The clean_contact_fields(data: dict) → dict function processes contact/lead data through multiple stages.

Pre-Processing Stage (Applied to ALL Fields)

Before any field-specific cleaning, ALL string fields undergo standardization:

1. Quote Normalization (normalize_quotes)

Same as company cleaning - converts all Unicode quote characters to standard ASCII

2. Empty Field Removal

Same as company cleaning - removes None, "", and whitespace-only fields

Note:

Contact cleaning does NOT include explicit whitespace stripping in the pre-processing loop (only quote normalization), but whitespace is still handled during field-specific cleaning.

Order of Operations:

1. Normalize quotes in all string fields
2. Apply field-specific cleaning (LinkedIn, Xing, etc.)
3. Remove all None/empty fields from payload

1. Email Cleaning (`clean_email`)

Note:

Same algorithm as company email cleaning

Examples:

Input:

"John.Doe@Company.COM"

Output:

"john.doe@company.com"

Input:

" info+sales@EXAMPLE.de "

Output:

"info+sales@example.de"

Fields Cleaned:

• contact_email_valid
• contact_email_invalid
• contact_email_catch_all
• contact_email_unsure
• All email identifiers in respective status arrays

2. LinkedIn Profile Cleaning (`clean_linkedin_profile`)

Purpose: Validate and normalize LinkedIn personal profile URLs

Detailed Algorithm:

1. Validation Check: URL must contain linkedin.com/in/
2. URL Splitting: Split URL by linkedin.com/in/
3. Slug Extraction: Take everything after linkedin.com/in/, remove trailing paths (split by /, take first part), remove query parameters (split by ?, take first part)
4. Length Validation: Slug must be at least 2 characters long
5. Reconstruction: Build URL as https://www.linkedin.com/in/{slug}
6. Null on Failure: Set to NULL if any validation fails

Examples:

✓ Valid:

Input:

"https://www.linkedin.com/in/john-doe/"

Output:

"https://www.linkedin.com/in/john-doe"

Input:

"linkedin.com/in/jane-smith-12345678?trk=profile"

Output:

"https://www.linkedin.com/in/jane-smith-12345678"

Input:

"https://de.linkedin.com/in/max-mustermann"

Output:

"https://www.linkedin.com/in/max-mustermann"

Input:

"http://m.linkedin.com/in/person-name/details"

Output:

"https://www.linkedin.com/in/person-name"

✗ Invalid (becomes NULL):

Input:

"https://www.linkedin.com/company/microsoft"

Reason:

Company page, not /in/

Output:

NULL

Input:

"https://www.linkedin.com/in/a"

Reason:

Slug too short: 1 char

Output:

NULL

Input:

"linkedin.com/in/"

Reason:

No slug

Output:

NULL

Input:

"https://www.linkedin.com/pub/john-doe/12/345/678"

Reason:

Old /pub/ format

Output:

NULL

Input:

"https://xing.com/profile/person"

Reason:

Not LinkedIn

Output:

NULL

IMPORTANT VALIDATION RULES:

• ONLY accepts /in/ URLs (personal profiles)
• REJECTS /company/ URLs → NULL
• REJECTS /school/ URLs → NULL
• REJECTS /pub/ URLs → NULL (old public profile format)

Edge Cases:

• Locale Prefixes: Removed automatically (e.g., de.linkedin.com → www.linkedin.com)
• Mobile URLs: Handled (e.g., m.linkedin.com → www.linkedin.com)
• Query Parameters: All removed (e.g., ?trk=profile, ?originalSubdomain=de)
• Trailing Slashes: Removed from slug
• Sub-pages: Removed (e.g., /details, /recent-activity)

Slug Validation:

• Minimum length: 2 characters
• Can contain: letters, numbers, hyphens, underscores
• Supports vanity URLs (e.g., john-doe) and numeric IDs (e.g., person-12345678)
• No validation of actual profile existence on LinkedIn
• No case transformation (preserves original case)

Error Handling:

• Try-except block catches malformed URLs
• Any exception during processing → NULL
• Missing parts after split → NULL
• Empty slug after extraction → NULL

Impact on Matching:

Without Cleaning

"https://de.linkedin.com/in/john-doe?trk=profile"

≠

"linkedin.com/in/john-doe"

→ Creates duplicate

With Cleaning

"https://www.linkedin.com/in/john-doe"

→ Prevents duplicate

Common Rejection Scenarios:

Input Type	Example	Result	Reason
Company page	`linkedin.com/company/microsoft`	NULL	Not /in/
School page	`linkedin.com/school/stanford`	NULL	Not /in/
Public profile	`linkedin.com/pub/john-doe/1/2/3`	NULL	Not /in/
Short slug	`linkedin.com/in/a`	NULL	Slug < 2 chars
No slug	`linkedin.com/in/`	NULL	Empty slug
Wrong platform	`xing.com/profile/test`	NULL	Not LinkedIn

Fields Cleaned:

• contact_linkedin
• All LinkedIn identifiers in contact_linkedins array

Name Cleaning

First Name (clean_first_name)

Input:

" john-paul "

Output:

"John-Paul"

Input:

"marie-josé"

Output:

"Marie-José"

Last Name (clean_last_name)

Input:

"von müller"

Output:

"Von Müller"

Input:

"o'brien"

Output:

"O'Brien"

Fields Cleaned:

• contact_first_name_cleaned
• contact_last_name_cleaned

Field Normalization

Gender Normalization

Maps variations to: male, female, diverse, unknown

"M" → "male""weiblich" → "female""non-binary" → "diverse"

Language Normalization

Converts to ISO 639-1 codes

"German" → "de""english" → "en""français" → "fr"

Academic Title Normalization

Standardizes academic titles

"dr." → "DR""prof. dr." → "PROF DR""ph.d." → "PHD"

Seniority Normalization

Standard levels: entry, mid, senior, manager, director, vp, c-level

"Senior" → "senior""VP of Sales" → "vp""C-Level" → "c-level"

Department Normalization

Standard departments: sales, marketing, it, hr, finance, operations, rd, customer_success

"Sales & Marketing" → "sales""IT / Technology" → "it""HR" → "hr"

Validation and Null Replacement Rules

All Social Media & Profile URL Cleaning

Common rules that apply to ALL URL cleaning functions:

✓ Validation Rules

1. Minimum slug length: 2 characters
2. Must contain correct domain
3. Must match expected path pattern
4. Exception handling active
5. No empty slugs allowed

✗ Results in NULL

• Slug shorter than 2 chars
• Wrong platform domain
• Wrong path pattern
• Processing exception
• Empty slug after extraction

Strict Platform-Specific Requirements

Platform	Required Pattern	Rejects	Output on Invalid
Company LinkedIn	`linkedin.com/company/`	/school/, /in/, /showcase/	NULL
Contact LinkedIn	`linkedin.com/in/`	/company/, /school/, /pub/	NULL
Company Xing	`xing.com/pages/`	/people/, /profile/, /companies/	NULL
Contact Xing	`xing.com/people/`	/pages/, /profile/	NULL
Instagram	`instagram.com`	Any non-Instagram domain	NULL
Facebook	`facebook.com`	Any non-Facebook domain	NULL
Pinterest	`pinterest.com`	Any non-Pinterest domain	NULL
TikTok	`tiktok.com/@`	URLs without @ symbol	NULL
YouTube	`youtube.com`	Any non-YouTube domain	NULL
Twitter/X	`x.com`	twitter.com (old domain)	NULL

Key Insights

NULL Behavior

• Invalid URLs become NULL, NOT empty string
• NULL fields are removed from payload
• No partial saves or fallback attempts
• Failed validations are silent (no errors)

Validation Strictness

• Twitter must be x.com (NOT twitter.com)
• TikTok must have @ symbol
• Xing company vs personal paths are different
• LinkedIn company vs personal paths are different

Cleaning Impact on Matching

Why Cleaning Matters for Deduplication

Scenario 1: Company Domain Matching

Without Cleaning

Record 1: "https://www.Example.COM/about"

Record 2: "example.com"

→ MISMATCH (creates duplicate)

With Cleaning

Record 1: "example.com"

Record 2: "example.com"

→ MATCH (prevents duplicate)

Scenario 2: LinkedIn Profile Matching

Without Cleaning

Record 1: "https://de.linkedin.com/in/john-doe?trk=profile"

Record 2: "linkedin.com/in/john-doe/"

→ MISMATCH (creates duplicate)

With Cleaning

Record 1: "linkedin.com/in/john-doe"

Record 2: "linkedin.com/in/john-doe"

→ MATCH (prevents duplicate)

Scenario 3: Email Matching

Without Cleaning

Record 1: "John.Doe@Company.COM"

Record 2: "john.doe@company.com"

→ MISMATCH (creates duplicate)

With Cleaning

Record 1: "john.doe@company.com"

Record 2: "john.doe@company.com"

→ MATCH (prevents duplicate)

Company Lookup Priority

1Domain (highest priority - most unique)
2LinkedIn URL
3Email
4Phone
5Company name + address

Contact Lookup Priority

1Email (highest priority - most unique)
2LinkedIn profile URL
3Xing profile URL
4First + Last name + Company

Data Quality Benefits

Consistency

All data stored in uniform format

Searchability

Easier to query and filter

Matching Accuracy

95%+ reduction in false negatives

Storage Efficiency

Eliminates redundant variations

API Performance

Faster comparison operations

User Experience

Predictable data format in responses

Application Scope

✓ Field Cleaning is Applied In:

✓All lookup operations (company_lookup, contact_lookup)
✓All push operations (company_push, contact_push)
✓All push_patch operations (company_push_patch, contact_push_patch)
✓CSV upload processing

✗ NOT Applied In:

✗GET operations (data already cleaned in database)
✗DELETE operations (no new data)

Order of Execution

1
Receive API request with raw data
Incoming payload from client application
2
Apply pre-processing
Whitespace stripping, quote normalization, empty field removal
3
Apply field-specific cleaning
Domain, LinkedIn, email, phone, social media, etc.
4
Remove NULL/empty fields from payload
Final cleanup before database operation
5
Proceed to database lookup/insert/update
Cleaned data ready for database operations

Summary

Company Cleaning Functions: 11

1. Domain cleaning
2. LinkedIn URL cleaning
3. Email cleaning
4. Phone cleaning
5. Instagram cleaning
6. Facebook cleaning
7. Xing cleaning
8. Pinterest cleaning
9. TikTok cleaning
10. YouTube cleaning
11. Twitter cleaning

Contact Cleaning Functions: 10

1. Email cleaning
2. LinkedIn profile cleaning
3. Xing profile cleaning
4. First name cleaning
5. Last name cleaning
6. Gender normalization
7. Language normalization
8. Academic title normalization
9. Position cleaning
10. Seniority/Department normalization

Applied In:

• All lookup operations (company_lookup, contact_lookup)
• All push operations (company_push, contact_push)
• All push_patch operations (company_push_patch, contact_push_patch)

Result:

• Consistent data format across entire database
• Accurate deduplication and matching
• Improved data quality and reliability
• Better user experience with predictable outputs

API Logic

This section explains the logic flow for all API endpoints. Each subsection describes what happens when an API endpoint is called, which functions are used, and what those functions do.

Rate Limits

Comprehensive information about rate limiting, queue systems, and connection pooling for all API endpoints.

Overall API Rate Limits Analysis

⚠️

Important: No Global Rate Limit

There is NO GLOBAL RATE LIMIT across all endpoints. Each endpoint has its own independent rate limit per IP address.

Single IP Maximum:28,000 requests/second(across all endpoints)

Per Minute:1,680,000 requests/minute

Multiple IPs:UNLIMITED(each IP has independent limits)

Complete Rate Limits by Endpoint

Endpoint	Rate Limit	Requests/Second
/company_lookup	1000/second	1,000
/company_push	5000/second	5,000
/company_push_patch	5000/second	5,000
/contact_lookup	1000/second	1,000
/contact_push	5000/second	5,000
/contact_push_patch	5000/second	5,000
/csv_upload	1000/second	1,000
/company_delete_fields	1000/second	1,000
/contact_delete_fields	1000/second	1,000
/company_get	1000/second	1,000
/contact_get	1000/second	1,000
/health	No limit	Unlimited
/redis_status	No limit	Unlimited
/job_status/{job_id}	No limit	Unlimited
/queue_stats	No limit	Unlimited

Actual System Bottlenecks

⚠️ Critical: Rate Limiter is NOT the Bottleneck!

Rate limiter allows (write endpoints):5,000 req/sec

Actual system capacity (writes):~20 req/sec

Gap: Rate limiter is 250× higher than actual capacity!

1. Queue System (Write Endpoints)

Max Queue Size:5,000 jobs

Workers:50 threads

Avg Processing:2-3 sec/job

Throughput:~20 req/sec

Burst Capacity:5,000 jobs

2. Database Connection Pool

Max Connections:120

Avg Query Time:50-120ms

Throughput:~1,500 queries/sec

Request Capacity:~750 req/sec

(Avg 2 queries per request)

Maximum Requests Summary

Measure	Single IP	System-Wide (All IPs)
Rate Limiter Allows	28,000 req/sec	Unlimited
Queue Can Accept (writes)	20 req/sec	20 req/sec
Database Can Handle	~750 req/sec	~750 req/sec
Actual Capacity (writes)	20 req/sec	20 req/sec
Actual Capacity (reads)	1,000 req/sec	Unlimited

Per Minute	Single IP	System-Wide (All IPs)
Rate Limiter Allows	1,680,000 req/min	Unlimited
Actual Capacity (writes)	1,200 req/min	1,200 req/min
Actual Capacity (reads)	60,000 req/min	Unlimited

Real-World System Capacity Scenarios

Scenario 1: Read-Only Traffic (Lookups & Gets)

Endpoints:

/company_lookup (cached 90%)
/contact_lookup (cached 90%)
/company_get
/contact_get

With 90% cache hit rate:

Cached requests:~10ms response time

DB requests (10%):~80ms response time

Example: 1,000 req/sec incoming

• 900 req/sec cached (no DB needed)

• 100 req/sec hit database

• 100 × 0.08 sec = 8 concurrent connections

Bottleneck: Rate limiter (intentional throttling)

Maximum: 1,000 req/sec per IP per endpoint

System-wide: UNLIMITED (multiple IPs)

Scenario 2: Write Traffic (Push/Patch)

Endpoints:

/company_push
/company_push_patch
/contact_push
/contact_push_patch

Queue workers:50

Processing time:2-3 sec per job

Throughput:~20 jobs/sec

Burst capacity:5,000 jobs

Burst duration:~4 minutes

Bottleneck: Background workers

Rate limiter allows: 5,000 req/sec per IP

System can handle: ~20 req/sec sustained

Gap: 250× higher than actual capacity!

Scenario 3: Mixed Traffic (Typical Production)

Typical production load distribution:

Read traffic (lookups/gets):80%

Write traffic (push/patch):20%

Example: 100 req/sec total

• 80 req/sec reads → ~8 DB connections (90% cached)

• 20 req/sec writes → All 50 workers busy

Result: System at capacity (workers saturated)

Bottleneck: Workers (20 req/sec write limit)

Key Insights

1. Rate Limiter is NOT the Bottleneck

The rate limiter is 250× HIGHER than actual capacity for write operations.

Why? Rate limiter prevents abuse, but workers prevent overload.

2. Different Limits for Different Operations

Read operations (lookups, gets):

Rate limiter: 1,000 req/sec per IP
Database: ~750 req/sec (with caching much higher)
Bottleneck: Rate limiter (intentional)

Write operations (push, patch):

Rate limiter: 5,000 req/sec per IP
Workers: ~20 req/sec
Bottleneck: Workers (need to scale)

3. No Global Rate Limit

Each endpoint has independent limits. A client can simultaneously:

Send 1,000 req/sec to /company_lookup
AND 5,000 req/sec to /company_push
AND 1,000 req/sec to /contact_lookup
... all from the same IP!

Total: 28,000 req/sec from single IP (rate limiter allows)

But system will return queue_full at ~20 req/sec for writes.

Scaling Recommendations

Option 1: Add Global Rate Limit

@app.middleware("http")
async def global_rate_limit(...)

Limit: 100 req/sec per IP across ALL endpoints. More realistic than 28,000 req/sec.

Option 2: Scale Workers

start_workers(num_workers=200)

New capacity: ~80 req/sec (4× improvement)

Option 3: Horizontal Scaling

Deploy multiple API instances:

Instance 1: 50 workers = 20 req/sec
Instance 2: 50 workers = 20 req/sec
Instance 3: 50 workers = 20 req/sec

Total: 60 req/sec capacity

Option 4: Lower Rate Limits

@limiter.limit("50/second")

Match rate limits to actual capacity instead of 5000/second

Rate Limiting System (Per Endpoint)

Implementation: SlowAPI library with in-memory storage. Rate limits are applied per IP address with independent limits for each client.

Endpoint	Rate Limit	Requests/Second	Purpose
/company_lookup	1000/second	1,000	Read-only lookup
/company_push	5000/second	5,000	Write operations
/company_push_patch	5000/second	5,000	Write operations
/contact_lookup	1000/second	1,000	Read-only lookup
/contact_push	5000/second	5,000	Write operations
/contact_push_patch	5000/second	5,000	Write operations
/csv_upload	1000/second	1,000	Job submission
/company_delete_fields	1000/second	1,000	Delete operations
/contact_delete_fields	1000/second	1,000	Delete operations
/company_get	1000/second	1,000	Read operations
/contact_get	1000/second	1,000	Read operations

Rate Limit Exceeded Response (HTTP 429)

{
  "error": "Rate limit exceeded: 1000 per 1 second",
  "detail": "Too many requests"
}

Why Different Limits?

1,000 req/sec: Fast database reads with caching, controlled deletions, job submission only
5,000 req/sec: Write operations need burst capacity for bulk imports; queue system provides backpressure control

Queue System (Background Job Processing)

Max Queue Size

5,000 jobs

Background Workers

50 threads

Job Timeout

55 seconds

Queue Full Response

{
  "error": "queue_full",
  "message": "System at capacity, please retry in a few seconds",
  "queue_size": 5001,
  "retry_after_seconds": 5,
  "max_queue_size": 5000
}

Timeout Response

{
  "error": "timeout",
  "job_id": "abc-123-def-456",
  "message": "Request timeout, poll /job_status/{job_id}"
}

Request Flow:

Request arrives
Rate limiter checks: Under limit for this IP? → If NO, return 429 error
Queue size check: Queue under 5,000? → If NO, return queue_full response
Enqueue job with UUID
Wait for result (55 seconds with automatic retry)
Background worker processes job
Return result or timeout with job_id for polling

Queue Architecture

API Request

Incoming HTTP Request

Rate Limiter Check

1000 or 5000 req/sec per IP

Queue Size Check

Is queue < 5000 jobs?

FULL

Return queue_full

Error Response

Enqueue Job

Create UUID & event

Wait for Result

55 seconds timeout • Connection held open

COMPLETE

Return Result

Success response

TIMEOUT

Return job_id

Poll for status

Job Queue

In-Memory FIFO

Max Size: 5,000 jobs

Worker #1

Processing

Worker #2

Processing

...

Worker #50

Processing

Database Operations

via Connection Pool

Queue Backpressure (Prevents Overload)

The queue size check acts as backpressure. When the queue fills up (5,000 jobs), new requests are rejected with a queue_full error. This prevents memory exhaustion and maintains system stability during traffic spikes.

Queue System: Detailed Overview

System Overview

Acceptance Rate:5,000 req/sec

Maximum requests the API can accept

Processing Rate:~1,000 req/sec

Actual throughput (DB connection limited)

The asynchronous queue system allows accepting up to 5× more requests than the system can process, providing a buffer for traffic spikes while maintaining stable processing rates.

Request Outcomes

Scenario A: Fast Processing (< 30 seconds)

Request:

POST /company_push_patch
{
  "company_name": "Example Corp",
  "company_domain": "example.com"
}

Response (within 30s):

{
  "company_main_id": "uuid-123",
  "company_workspace_id": "uuid-456",
  "status_company": "created",
  "status_company_workspace": null,
  "error": null,
  "input": {
    "company_name": "Example Corp",
    "company_domain": "example.com"
  }
}

✓ Job completed immediately - feels synchronous to the client

Scenario B: Timeout (> 30 seconds)

Initial Response (after 30s timeout):

{
  "error": "timeout",
  "message": "Job not completed within 30 seconds",
  "job_id": "abc-123-def-456",
  "status": "queued",
  "queue_position": 142,
  "estimated_wait_seconds": 14.2,
  "check_status_url": "/job_status/abc-123-def-456"
}

Then poll for result:

GET /job_status/abc-123-def-456

{
  "job_id": "abc-123-def-456",
  "status": "completed",
  "result": {
    "company_main_id": "uuid-123",
    "company_workspace_id": "uuid-456",
    "status_company": "created",
    "status_company_workspace": null,
    "error": null,
    "input": { "company_name": "Example Corp", "company_domain": "example.com" }
  },
  "metadata": {
    "job_type": "company_push_patch",
    "queued_at": "2025-10-11T12:00:00Z",
    "started_at": "2025-10-11T12:00:30Z",
    "completed_at": "2025-10-11T12:00:31Z",
    "status": "completed",
    "worker_id": 3
  }
}

⚠ Job queued - client needs to poll for result

Endpoint Classification

Queued Endpoints (5,000 req/sec)

These endpoints use the queue system:

•POST /company_push
•POST /company_push_patch
•POST /contact_push
•POST /contact_push_patch

Direct Endpoints (1,000 req/sec)

These remain synchronous (no queue):

•POST /company_lookup(read-only, fast)
•POST /contact_lookup(read-only, fast)

Performance Under Different Load Conditions

Light Load (0-1,000 req/sec)

• All requests complete within 30s
• No timeouts
• Immediate responses
• User experience: Synchronous feel

Medium Load (1,000-3,000 req/sec)

• Most requests complete within 30s
• Occasional timeouts for burst traffic
• 90%+ immediate responses
• User experience: Mostly synchronous

Heavy Load (3,000-5,000 req/sec)

• Many timeouts (jobs queued > 30s)
• Clients need to poll for results
• 100% acceptance rate (no rejections)
• User experience: Async with polling

Overload (>5,000 req/sec)

• Rate limiter kicks in
• Requests beyond 5,000/sec get HTTP 429
• Still better than before (was failing at 200/sec)
• User experience: Rate limit errors

Before vs After Queue System

Metric	Before (Synchronous)	After (Queue System)
Rate limit	1,000/sec	5,000/sec
Connection pool	20	120
Actual capacity	~200 req/sec	~1,000 req/sec
Acceptance rate	~200 req/sec	5,000 req/sec
Failures at 1,000 req/sec	80%	0%

Database Connection Pool

Min Connections

Always kept alive in pool

Max Connections

120

Matches Supabase Pro tier limit

Connection Pool Benefits:

Performance: Reuses connections (no TCP handshake overhead)
Resource Management: Limits database connections, prevents exhaustion
Thread Safety: Multiple workers can request connections concurrently

Why 120 connections?

Supports 1,000 req/sec with avg 120ms query time: 1000 × 0.12 = 120 concurrent queries

ThreadedConnectionPool (psycopg2)

Application Layer Connection Management

Available Connections (10-120)

Conn 1

Conn 2

Conn 3

...

Conn N

getconn() / putconn()

In-Use Connections (0-120)

APP

Query

APP

Query

APP

Query

...

TCP Connection

Supabase Connection Pooler

aws-1-eu-central-1.pooler.supabase.com

PgBouncer: Session Pooling

Max: 1000 connections

PostgreSQL Database

Supabase Hosted

Connection Lifecycle

Workers request connections via getconn(), use them for database operations, then return them via putconn(). Connections are reused, avoiding the overhead of establishing new TCP connections for each query.

Rate Limiting + Queue Interaction

Request Flow

Request arrives

Client sends HTTP request to API endpoint

✓Rate limiter check: Under 5000/sec for this IP?

→ YESContinue to next step

→ NOReturn HTTP 429 error

✓Queue size check: Queue under 5000?

→ YESEnqueue job and continue

→ NOReturn queue_full response

Background worker processes job

One of 50 workers picks up and executes the job

⚠️

Two Layers of Protection

Rate limiting happens BEFORE queue check. This dual-layer protection prevents both abuse (rate limiter) and system overload (queue limit).

System Performance

Component	Configuration	Purpose	Limit
Rate Limiter	1,000-5,000 req/sec per IP	Prevent abuse	Per client IP
Queue	Max 5,000 jobs	Buffer requests	System-wide
Workers	50 background threads	Process jobs	~20 jobs/sec throughput
Connection Pool	10-120 connections	Database access	Supabase limit: 1,000
Job Timeout	55 seconds	Prevent hanging	Railway: 60 sec
Retry	1 automatic retry	Handle edge cases	110 sec total

Request Path

Client → Rate Limiter (1000-5000/s) → Queue Check (< 5000 jobs) → Enqueue → Worker (50 workers) → Database (120 conn) → Response

Load Scenarios

✓ Low Load (< 10 req/sec)

• Queue Size: 0-50 jobs
• Connection Pool: 10-20 connections in use
• Response Time: < 1 second

○ Medium Load (100 req/sec)

• Queue Size: 200-400 jobs
• Connection Pool: 50-80 connections in use
• Response Time: 4-8 seconds

⚠ High Load (500 req/sec)

• Queue Size: 2000-4000 jobs
• Connection Pool: 100-120 connections in use
• Response Time: 40-80 seconds
• Warning: Approaching queue_full threshold

🔴 Overload (1000+ req/sec sustained)

• Queue Size: Hits 5,000 max
• Connection Pool: 120 connections in use
• Many requests return queue_full error
• Action Required: Client backs off, retries after 5 seconds

Job Status Polling

Endpoint

GET /job_status/{job_id}

Used when a request times out and the client needs to check the result later.

1. Queued

{
  "status": "queued",
  "result": null,
  "queue_position": 42,
  "estimated_wait_seconds": 126
}

2. Processing

{
  "status": "processing",
  "result": null,
  "queue_position": 0,
  "estimated_wait_seconds": 0
}

3. Completed

{
  "status": "completed",
  "result": {
    "company_main_id": "uuid",
    "company_workspace_id": "uuid",
    "status_company": "created",
    "status_company_workspace": "created",
    "error": null
  },
  "queue_position": 0,
  "estimated_wait_seconds": 0
}

4. Failed

{
  "status": "failed",
  "result": {
    "error": "Database connection failed"
  },
  "queue_position": 0,
  "estimated_wait_seconds": 0
}

5. Not Found

{
  "status": "not_found",
  "result": null,
  "queue_position": 0,
  "estimated_wait_seconds": 0
}

Queue Statistics

Endpoint

GET /queue_stats

Response

{
  "queue_size": 142,
  "active_workers": 50,
  "jobs_completed": 125847,
  "jobs_failed": 23,
  "average_processing_time_seconds": 2.3,
  "queue_capacity": 5000,
  "queue_utilization_percentage": 2.84
}

Metrics Explanation

queue_sizeCurrent jobs waiting in queue

active_workersNumber of worker threads (always 50)

jobs_completedTotal successful jobs since startup

jobs_failedTotal failed jobs since startup

average_processing_time_secondsMean time per job

queue_capacityMaximum queue size (5000)

queue_utilization_percentageCurrent queue fill percentage

Health Check Indicators

✓ Healthy System

•Queue size: < 1,000

•Queue full errors: 0

•Timeout rate: < 1%

•Connection usage: < 70%

•Worker utilization: < 80%

⚠ Warning State

•Queue size: 1,000-3,000

•Queue full errors: < 10/min

•Timeout rate: 1-5%

•Connection usage: 70-90%

•Worker utilization: 80-95%

🔴 Critical State

•Queue size: > 4,000

•Queue full errors: > 50/min

•Timeout rate: > 10%

•Connection usage: > 95%

•Worker utilization: > 95%

System Tuning Parameters

To increase system throughput, consider adjusting the following parameters:

1. Increase Workers

await queue_manager.start_workers(num_workers=100) # Was 50

Pros:2× throughput (34-50 jobs/sec)

Cons:More CPU usage, more database connections

2. Increase Connection Pool

pool = ThreadedConnectionPool(minconn=20, maxconn=200, dsn=...) # Was 10-120

Pros:More concurrent queries possible

Cons:May exceed Supabase limit, more memory usage

3. Increase Queue Size

MAX_QUEUE_SIZE = 10000 # Was 5000

Pros:Accept more burst load

Cons:Longer wait times, more memory usage

4. Decrease Timeout

result = await queue_manager.wait_for_job(job_id, timeout=30) # Was 55

Pros:Faster timeout detection

Cons:More timeout responses to clients

Railway Deployment Considerations

Railway Timeout

Hard Limit: 60 seconds per request

Why 55-second timeout in code?

Railway kills requests at 60 seconds
55-second timeout leaves 5-second buffer
Buffer allows time to return timeout response
Client receives job_id for polling

Automatic Retry Logic

if result.get('error') == 'timeout':
    # Retry by waiting for the job again
    retry_result = await queue_manager.wait_for_job(result['job_id'], timeout=55)
    return retry_result

Request Flow:

First wait: 55 seconds → timeout
Automatic retry: wait another 55 seconds
Total time: up to 110 seconds
If still timeout: Return job_id to client for polling

Why retry?

Job might complete just after first timeout
Gives job extra time before returning polling response
Reduces need for client polling

Startup Configuration

🚀 FastAPI server starting...
⚡ Event loop: uvloop (2-4× faster on Linux)
📊 Database pool: 10-120 connections
🔒 Rate limit: 5000 requests/second per IP
⏱️  Queue timeout: 55 seconds (with 5s buffer before Railway's 60s timeout)
🛡️  Max queue size: 5000 (backpressure enabled)

Summary Table

Component	Configuration	Purpose	Limit
Rate Limiter	1,000-5,000 req/sec per IP	Prevent abuse	Per client IP
Queue	Max 5,000 jobs	Buffer requests	System-wide
Workers	50 background threads	Process jobs	Throughput: ~20/sec
Connection Pool	10-120 connections	Database access	Supabase limit: 1,000
Job Timeout	55 seconds	Prevent hanging	Railway: 60 sec
Retry	1 automatic retry	Handle edge cases	110 sec total

Request Path

Client

→

Rate Limiter

1000-5000/s

→

Queue Check

< 5000 jobs

→

Enqueue

→

Worker

50 workers

→

Database

120 conn

→

Response

Companies

API endpoints for managing company data including lookup, retrieval, creation, updates, and field deletion.

POST

/company_lookup

Purpose: Search for an existing company in the database using identifiers like name, domain, or LinkedIn URL.

Logic Flow

1. Receive Request Data

Accepts: company_name, company_domain, company_linkedin, workspace_id

2. Clean the Data

Calls: clean_company_fields() from processing_cleaning_funtions/company_field_cleaning.py

What it does: Normalizes the input data (removes trailing slashes, converts to lowercase, standardizes URLs)

3. Check Cache

• Generates a unique cache key based on the input parameters
• Checks if this lookup was done recently
• If found in cache: Returns the cached result immediately
• If not in cache: Continues to database lookup

4. Lookup Company

Calls: lookup_company() from processing_lookup_funtions/company_lookup.py

What it does:

• Searches the v_company_lookup database view
• Checks if the provided name, domain, or LinkedIn matches any company
• Uses OR logic: matches if ANY identifier matches
• Returns FIRST match found (LIMIT 1)
• If workspace_id is provided: Also checks if this company is connected to that workspace

Returns: company_main_id and company_workspace_id (if found)

5. Cache the Result

Saves the result to cache for 1 hour. This includes "not found" results to avoid repeated database queries.

6. Return Response

Returns: {company_main_id, company_workspace_id, error}

company_main_id: The unique ID of the company (or null if not found)
company_workspace_id: The ID of the workspace connection (or null if not found or no workspace_id provided)

Error Handling in API File

The process_company_lookup() function uses try/except to handle errors:

Try Block:

• Validates that data is provided (if not: returns error "No data provided")
• Sets connection pool for lookup module
• Calls cleaning function to normalize data
• Calls lookup function (which has its own internal error handling)
• Caches the result
• Returns formatted response

Except Block:

• Catches any unexpected errors during the process
• Returns response with null IDs and error message as string
• Examples: Cleaning function fails, Cache system errors, Unexpected exceptions

Called Functions:

• clean_company_fields(): Normalizes company data (URLs, whitespace, domains)
• lookup_company(): Queries database for company match, manages database connection

Match Priority

When searching with multiple identifiers (e.g., both domain and LinkedIn), the system:

• Uses OR logic: Matches if ANY identifier matches
• Returns the FIRST match found
• Does NOT rank or score matches

Example:

Input: company_domain = "example.com", company_linkedin = "linkedin.com/company/different-company"Result: Returns whichever company is found first in the database

Best Practice: Use the most specific identifier you have (domain is usually most reliable)

Error Scenarios

No identifiers provided:

{company_main_id: null, company_workspace_id: null, error: "No lookup fields provided"}

Company not found:

{company_main_id: null, company_workspace_id: null, error: null}

Note: This is a valid result (company doesn't exist), not an error

Database error:

{company_main_id: null, company_workspace_id: null, error: "Database query failed: [details]"}

Company found but not in workspace:

{company_main_id: "uuid", company_workspace_id: null, error: null}

This means: Company exists globally, but not connected to the specified workspace

Practical Examples

Request:

POST /company_lookup
{
  "company_domain": "anthropic.com",
  "workspace_id": "workspace-123"
}

Response - Company found and in workspace:

{
  "company_main_id": "f7e9a8b1-1234-5678-9abc-def123456789",
  "company_workspace_id": "a1b2c3d4-5678-9abc-def1-23456789abcd",
  "error": null
}

Response - Company found but NOT in workspace:

{
  "company_main_id": "f7e9a8b1-1234-5678-9abc-def123456789",
  "company_workspace_id": null,
  "error": null
}

Response - Company not found:

{
  "company_main_id": null,
  "company_workspace_id": null,
  "error": null
}

People

API endpoints for managing contact/people data including lookup, retrieval, creation, updates, and field deletion.

POST

/contact_lookup

Purpose: Search for an existing contact (person and/or lead) in the database using identifiers like email, LinkedIn, Xing, or name.

Logic Flow

1. Receive Request Data

Accepts: contact_email_valid, contact_email_catch_all, contact_email_invalid, contact_email_unsure, contact_linkedin, contact_xing, contact_first_name, contact_first_name_cleaned, contact_last_name, contact_last_name_cleaned, company_id, workspace_id

2. Clean the Data

Calls: clean_contact_fields() from processing_cleaning_funtions/contact_field_cleaning.py

What it does: Normalizes the input data (lowercase emails, standardizes URLs, trims whitespace)

3. Check Cache

• Generates a unique cache key based on the input parameters
• Checks if this lookup was done recently
• If found in cache: Returns the cached result immediately
• If not in cache: Continues to database lookup

4. Lookup Contact (Two-Phase Lookup)

Calls: lookup_contact() from processing_lookup_funtions/contact_lookup.py

Phase 1 - Lead Lookup (with company_id):

Searches for a lead (person working at a specific company)

Matches by:

• LinkedIn + company_id
• Xing + company_id
• Email addresses (any type)
• First name + Last name + company_id

Uses OR logic: Matches if ANY condition is true

If match found: Returns both lead_id and person_id, then STOPS (doesn't run Phase 2)

Phase 2 - Person Lookup (without company_id):

Only runs if Phase 1 found nothing

Searches for a person regardless of company

Matches by:

• LinkedIn (without company requirement)
• Xing (without company requirement)

If match found: Returns person_id only (lead_id is null)

Workspace Lookup:

If workspace_id is provided and person/lead is found:

• Checks if the person is connected to that workspace
• Checks if the lead is connected to that workspace

Returns: lead_id, person_id, lead_workspace_id, people_workspace_id

5. Cache the Result

Saves the result to cache for 1 hour. This includes "not found" results to avoid repeated database queries.

6. Return Response

Returns: {lead_id, person_id, lead_workspace_id, people_workspace_id, error}

lead_id: The unique ID of the lead (or null if not found)
person_id: The unique ID of the person (or null if not found)
lead_workspace_id: The ID of the lead's workspace connection (or null)
people_workspace_id: The ID of the person's workspace connection (or null)

Understanding Person vs Lead

Person: Represents an individual with basic information (name, demographics, career history)
Lead: Represents a person's connection to a specific company (their position, department, start date at that company)
One person can have multiple leads (if they worked at multiple companies)
The lookup prioritizes finding leads (person + company match) over just finding the person

Example:

• John Smith works at Company A as CEO (Lead 1)
• John Smith works at Company B as Advisor (Lead 2)
• Both leads link to the same Person record (John Smith)

Match Priority

The lookup follows a strict priority order:

1. Email (any type) - Always checked first in Phase 1
2. LinkedIn + company_id - Checked in Phase 1
3. Xing + company_id - Checked in Phase 1
4. Name + company_id - Checked in Phase 1
5. LinkedIn alone - Only checked in Phase 2 (if Phase 1 fails)
6. Xing alone - Only checked in Phase 2 (if Phase 1 fails)

Why this order?

• Emails are unique and most reliable
• Social profiles with company context are more specific than just names
• Phase 2 is a fallback for when we don't have company context

Returns first match: Like company lookup, returns the first match found

Error Scenarios

No identifiers provided:

{lead_id: null, person_id: null, lead_workspace_id: null, people_workspace_id: null, error: "No lookup fields available"}

Contact not found:

{lead_id: null, person_id: null, lead_workspace_id: null, people_workspace_id: null, error: null}

Note: This is a valid result (contact doesn't exist), not an error

Person found but not lead (Phase 2 success):

{lead_id: null, person_id: "uuid", lead_workspace_id: null, people_workspace_id: null, error: null}

This means: Person exists but we don't know their connection to the specified company

Lead found but not in workspace:

{lead_id: "uuid", person_id: "uuid", lead_workspace_id: null, people_workspace_id: "uuid", error: null}

This means: Lead exists, person is in workspace, but lead is not in workspace

Database error:

{lead_id: null, person_id: null, lead_workspace_id: null, people_workspace_id: null, error: "Database query failed: [details]"}

Practical Examples

Example 1: Full lead match with workspace

Request:

POST /contact_lookup
{
  "contact_email_valid": "john@anthropic.com",
  "company_id": "company-uuid-123",
  "workspace_id": "workspace-456"
}

Response:

{
  "lead_id": "lead-789",
  "person_id": "person-abc",
  "lead_workspace_id": "lead-ws-xyz",
  "people_workspace_id": "people-ws-def",
  "error": null
}

Interpretation: John works at this company (lead found), and both the lead and person are in the workspace.

Example 2: Person found but no lead (Phase 2 success)

Request:

POST /contact_lookup
{
  "contact_linkedin": "linkedin.com/in/johndoe",
  "company_id": "company-uuid-999"
}

Response:

{
  "lead_id": null,
  "person_id": "person-abc",
  "lead_workspace_id": null,
  "people_workspace_id": null,
  "error": null
}

Interpretation: John exists in database, but we don't have a record of him working at company 999. Phase 1 found nothing (no lead at that company), but Phase 2 found John by LinkedIn.

Example 3: Not found

Request:

POST /contact_lookup
{
  "contact_email_valid": "unknown@example.com"
}

Response:

{
  "lead_id": null,
  "person_id": null,
  "lead_workspace_id": null,
  "people_workspace_id": null,
  "error": null
}

Interpretation: This person doesn't exist in the database at all.

GET

/company_get

Purpose: Retrieve complete company data including all fields and identifiers. Requires query parameters: company_id=xxx&workspace_id=yyy

Logic Flow

1. Receive Request Parameters

Required: company_id
Optional: workspace_id

2. Get Main Company Data

Queries: db_companies_main table

Retrieves all company fields:

• company_name_cleaned, company_legal_form, b2b_b2c
• Address fields (street, city, zip, region, country)
• Registration numbers (tax, VAT, handels register)
• Employee counts, founded year, description
• Logo URL, LinkedIn size and followers
• Arrays: company_tags, company_sources
• Timestamps: created_at, updated_at

3. Get Company Identifiers

Queries: db_companies_dt_identifiers table

Retrieves all identifiers for this company

Organizes them by type into separate arrays:

• company_names: Array of all name variations
• company_domains: Array of all domains
• company_emails: Array of all email addresses
• company_phones: Array of all phone numbers
• company_linkedins: Array of all LinkedIn URLs
• company_instagrams: Array of all Instagram URLs
• company_facebooks: Array of all Facebook URLs
• company_xings: Array of all Xing URLs
• company_pinterests: Array of all Pinterest URLs
• company_tiktoks: Array of all TikTok URLs
• company_youtubes: Array of all YouTube URLs
• company_twitters: Array of all Twitter URLs

4. Get Workspace Data (if workspace_id provided)

Queries: db_companies_workspace table

Retrieves workspace-specific data:

• company_workspace_id: The ID of the workspace connection
• company_qualified: Qualification status in this workspace
• company_custom_tags_ws: Custom tags for this workspace
• Timestamps: workspace created_at, updated_at

If not provided or not found: Returns null values for workspace fields

5. Return Response

Returns: Complete company data dictionary with:

• All main table fields
• All identifiers organized in arrays by type
• Workspace data (if workspace_id was provided)
• Timestamps renamed to indicate source table (e.g., db_companies_main_created_at)

Error Handling in API File

The process_company_get() function uses try/except to handle errors:

Try Block:

• Validates that company_id is provided (if not: returns error "company_id is required")
• Gets database connection from pool
• Queries db_companies_main table for company data
• If company not found: Returns error "Company not found"
• Queries db_companies_dt_identifiers and organizes by type
• If workspace_id provided: Queries db_companies_workspace table
• Returns complete company data dictionary

Except Block:

• Catches any unexpected errors during database queries or data processing
• Returns error response with error message
• Examples: Database connection fails, Query execution errors, Data formatting errors

Note:

No cleaning functions are called (get endpoints don't need cleaning). Database connection is managed internally.

Why Use Company Get?

Company Lookup tells you IF a company exists and returns just the IDs
Company Get gives you ALL the information about a company once you know its ID
Use lookup first to find the company, then use get to retrieve all details

Error Scenarios

Company doesn't exist:

{"error": "Company not found", "company_id": "xxx"}

This happens when the company_id doesn't exist in the database

Company exists but not in workspace:

Returns: Complete company data with workspace fields as null

company_workspace_id: null, company_qualified: null, company_custom_tags_ws: null

Database error:

{"error": "Database error: [error details]", "company_id": "xxx"}

Empty identifier arrays:

Always returns arrays (even if empty): company_domains: []

This is normal - not all companies have all types of identifiers

Practical Example

Request:

GET /company_get?company_id=f7e9a8b1-1234-5678-9abc-def123456789&workspace_id=workspace-123

Response:

{
  "company_name_cleaned": "Anthropic",
  "company_legal_form": "Inc",
  "b2b_b2c": "B2B",
  "company_name_imprint": null,
  "company_street": "123 Main St",
  "company_street_nr": "123",
  "company_city": "San Francisco",
  "company_zip": "94105",
  "company_region": "California",
  "company_country": "USA",
  "company_tax_nr": null,
  "company_vat_nr": null,
  "company_handels_register": null,
  "company_employees_research": "100-500",
  "company_employees_linkedin": "250",
  "company_founded_year": "2021",
  "company_description": "AI safety and research company",
  "company_logo_url": "https://...",
  "company_size_linkedin": "201-500",
  "company_linkedin_followers": "50000",
  "company_tags": ["AI", "Research", "Safety"],
  "company_sources": ["LinkedIn", "Website"],
  "db_companies_main_created_at": "2024-01-15T10:30:00Z",
  "db_companies_main_updated_at": "2024-03-20T14:45:00Z",

  "company_names": ["Anthropic", "Anthropic PBC"],
  "company_domains": ["anthropic.com", "www.anthropic.com"],
  "company_emails": ["info@anthropic.com", "contact@anthropic.com"],
  "company_phones": ["+1-555-0100"],
  "company_linkedins": ["linkedin.com/company/anthropic"],
  "company_instagrams": [],
  "company_facebooks": [],
  "company_xings": [],
  "company_pinterests": [],
  "company_tiktoks": [],
  "company_youtubes": [],
  "company_twitters": ["twitter.com/anthropicai"],

  "company_workspace_id": "ws-abc-123",
  "company_qualified": "high",
  "company_custom_tags_ws": ["hot-lead", "enterprise"],
  "db_companies_workspace_created_at": "2024-02-01T09:00:00Z",
  "db_companies_workspace_updated_at": "2024-03-15T16:20:00Z"
}

GET

/contact_get

Purpose: Retrieve complete contact data including person information, lead information, and all identifiers. Requires query parameters: lead_id=xxx&workspace_id=yyy

Logic Flow

1. Receive Request Parameters

Required: lead_id
Optional: workspace_id

2. Get Lead Data

Queries: db_leads table

Retrieves all lead fields:

people_id: Reference to the person record
companies_main_id: Reference to the company
lead_position, lead_position_cleaned
lead_seniority, lead_departement
still_at_company: Boolean indicating if person still works there
lead_start_date, lead_end_date: Employment dates
lead_seniority_enum, lead_departement_enum: Standardized values
Position variations (plural forms)
lead_summary: Summary of the lead
lead_sources: Array of data sources
Timestamps: created_at, updated_at

Extracts people_id to retrieve person data

3. Get Person Data

Queries: db_people table using the people_id

Retrieves all person fields:

Name fields (first name, last name, cleaned versions)
person_gender, person_language
Birth information (year, date, estimated year)
Location (country, city, state)
LinkedIn CV and volunteering experience
Career dates (education start, first job start)
contact_location, contact_academic_title
person_native_german, person_scooling_country
LinkedIn profile image URL
LinkedIn followers and connections count
Timestamps: created_at, updated_at

4. Get People Identifiers

Queries: db_people_identifiers table

Retrieves social profile identifiers

Organizes them into arrays:

contact_linkedins: Array of all LinkedIn URLs
contact_xings: Array of all Xing URLs

5. Get Lead Identifiers

Queries: db_leads_identifiers table

Retrieves email identifiers with their validation status

Organizes them by status into separate arrays:

contact_emails_valid: Array of validated emails
contact_emails_invalid: Array of invalid emails
contact_emails_catch_all: Array of catch-all emails
contact_emails_wrong: Array of wrong emails
contact_emails_unsure: Array of emails with unsure status

6. Get Workspace Data (if workspace_id provided)

Queries: db_leads_workspace table

Retrieves workspace-specific lead data:

lead_workspace_id: The ID of the workspace connection
lead_qualified_ws: Qualification status in this workspace
Timestamps: workspace created_at, updated_at

If not provided or not found: Returns null values for workspace fields

7. Return Response

Returns: Complete contact data dictionary with:

All lead fields
All person fields
People identifiers in arrays
Lead identifiers organized by email status
Workspace data (if workspace_id was provided)
Timestamps renamed to indicate source table

Error Handling in API File

The process_contact_get() function in api_contact_get.py uses try/except to handle errors:

Try Block:

Validates that lead_id is provided (if not: returns error "lead_id is required")
Gets database connection from pool
Queries db_leads table for lead data
If lead not found: Returns error "Lead not found"
Extracts people_id from lead data
Queries db_people table using people_id
Queries db_people_identifiers and organizes by type
Queries db_leads_identifiers and organizes by email status
If workspace_id provided: Queries db_leads_workspace table
Returns complete contact data dictionary

Except Block:

Catches any unexpected errors during database queries or data processing
Returns error response with error message
Examples of caught errors:
- Database connection fails
- Query execution errors
- Data formatting errors

Note: No cleaning functions are called (get endpoints don't need cleaning). Database connection is managed internally.

Why Use Contact Get?

Contact Lookup tells you IF a contact exists and returns just the IDs
Contact Get gives you ALL the information about a contact once you know the lead_id
Use lookup first to find the contact, then use get to retrieve all details
Contact Get requires lead_id (not person_id) because it's designed to get the full context of a person's role at a specific company

Why Lead ID and Not Person ID?

Contact Get requires a lead_id because:

It's designed to show a person in the context of a specific company
One person can work at multiple companies (multiple leads)
If you used person_id, the API wouldn't know which company context to show

Example:

John Smith (person) works at Company A as CEO (lead 1)
John Smith (same person) works at Company B as Advisor (lead 2)
Calling contact_get with lead 1's ID shows John's CEO role at Company A
Calling contact_get with lead 2's ID shows John's Advisor role at Company B

Error Handling

Scenario: Lead doesn't exist

{"error": "Lead not found", "lead_id": "xxx"}

This happens when the lead_id doesn't exist in the database

Scenario: Lead exists but person data missing

Complete lead data with null values for all person fields

This is rare but can happen if data integrity issues exist

Scenario: Lead exists but not in workspace

Complete contact data with workspace fields as null

Example: lead_workspace_id: null, lead_qualified_ws: null

Scenario: Database error

{"error": "Database error: [error details]", "lead_id": "xxx"}

Scenario: Empty identifier arrays

contact_linkedins: []

Always returns arrays (even if empty) - This is normal; not all contacts have all types of identifiers

Practical Example

Request:

GET /contact_get?lead_id=lead-789-xyz&workspace_id=workspace-456

Response:

{
  "people_id": "person-abc-123",
  "companies_main_id": "company-xyz-789",
  "lead_position": "Chief Executive Officer",
  "lead_position_cleaned": "CEO",
  "lead_seniority": "C-Level",
  "lead_departement": "Executive",
  "still_at_company": true,
  "lead_start_date": "2020-01-01",
  "lead_end_date": null,
  "lead_seniority_enum": "c_level",
  "lead_departement_enum": "executive",
  "lead_position_clean_plural_dativ": "CEOs",
  "lead_position_clean_plural_nominativ": "CEOs",
  "lead_summary": "Experienced executive in AI industry",
  "lead_sources": ["LinkedIn", "Company Website"],
  "db_leads_created_at": "2024-01-10T11:00:00Z",
  "db_leads_updated_at": "2024-03-18T15:30:00Z",

  "person_first_name": "John",
  "contact_first_name_cleaned": "John",
  "person_last_name": "Smith",
  "contact_last_name_cleaned": "Smith",
  "person_gender": "male",
  "person_language": "English",
  "contact_estimated_birth_year": "1980",
  "contact_birth_year": "1980",
  "contact_birth_date": "1980-05-15",
  "person_country": "USA",
  "person_city": "San Francisco",
  "linkedin_cv": "Detailed career history...",
  "linkedin_volunteerings": "Board member at...",
  "started_education_linkedin": "1998",
  "first_job_start_linkedin": "2002",
  "contact_location": "San Francisco, CA",
  "contact_academic_title": "PhD",
  "person_state": "California",
  "person_native_german": false,
  "person_scooling_country": "USA",
  "contact_linkedin_image_url": "https://...",
  "person_linkedin_followers": "5000",
  "person_linkedin_connections": "500+",
  "db_people_created_at": "2024-01-05T09:00:00Z",
  "db_people_updated_at": "2024-03-10T14:00:00Z",

  "contact_linkedins": [
    "linkedin.com/in/johnsmith",
    "linkedin.com/in/john-smith-ceo"
  ],
  "contact_xings": [],

  "contact_emails_valid": [
    "john@company.com",
    "john.smith@company.com"
  ],
  "contact_emails_invalid": [],
  "contact_emails_catch_all": [
    "info@company.com"
  ],
  "contact_emails_wrong": [],
  "contact_emails_unsure": [
    "j.smith@company.com"
  ],

  "lead_workspace_id": "lead-ws-xyz-123",
  "lead_qualified_ws": "qualified",
  "db_leads_workspace_created_at": "2024-02-05T10:00:00Z",
  "db_leads_workspace_updated_at": "2024-03-12T16:45:00Z"
}

POST

/company_push

Purpose: Look up company and automatically create it if not found. Also creates workspace connection if workspace_id is provided and connection doesn't exist.

Logic Flow

1. Receive Request Data

Accepts: All company fields, workspace_id

2. Set Connection Pools

Sets connection pool for: company_lookup, company_push, company_workspace_push

3. Clean the Data

Calls: clean_company_fields() from processing_cleaning_funtions/company_field_cleaning.py

What it does: Normalizes company data

4. Lookup Company

Calls: lookup_company() from processing_lookup_funtions/company_lookup.py

NO CACHING - Always fresh lookup to determine correct status

Returns: company_main_id and company_workspace_id (if found)

5. Determine Company Status

If company NOT found:

Calls: push_company() from processing_push_funtions/company_push.py
What it does: Creates new company record in db_companies_main and identifiers in db_companies_dt_identifiers
Sets status_company = "created"
Updates company_main_id with newly created ID

If company found:

Sets status_company = "found"
No push operation needed

6. Determine Workspace Status (if workspace_id provided and company exists)

If workspace connection NOT found:

Calls: push_company_workspace() from processing_push_funtions/company_workspace_push.py
What it does: Creates workspace connection in db_companies_workspace
Sets status_company_workspace = "created"
Updates company_workspace_id with newly created ID

If workspace connection found:

Sets status_company_workspace = "found"
No push operation needed

7. Return Response

Returns: {company_main_id, company_workspace_id, status_company, status_company_workspace, error}

status_company: "found" or "created"
status_company_workspace: "found" or "created" (or null if no workspace_id provided)

Error Handling in API File

The process_company_push() function handles errors at multiple stages:

Validation:

If no data provided: Returns error "No data provided"

Lookup Error:

If lookup_company() returns error: Returns error from lookup function

Push Errors:

If company_push() fails: Returns error "company_push failed" with details
If company_workspace_push() fails: Returns company_main_id and status_company, but error for workspace operation

Called Functions (Brief Description):

clean_company_fields(): Normalizes company data
lookup_company(): Searches database for company
push_company(): Creates new company record with identifiers, returns company_id
push_company_workspace(): Creates workspace connection, returns workspace connection id

POST

/contact_push

Purpose: Look up contact and automatically create person, lead, and workspace connections if not found.

Logic Flow

1. Receive Request Data

Accepts: All contact fields, company_id, workspace_id

2. Set Connection Pools

Sets connection pool for: contact_lookup, people_push, lead_push, people_workspace_push, lead_workspace_push

3. Extract IDs

Extracts company_id and workspace_id from data

4. Clean the Data

Calls: clean_contact_fields() from processing_cleaning_funtions/contact_field_cleaning.py

What it does: Normalizes contact data

5. Lookup Contact

Calls: lookup_contact() from processing_lookup_funtions/contact_lookup.py

NO CACHING - Always fresh lookup

Returns: lead_id, person_id, lead_workspace_id, people_workspace_id

6. Determine Person Status

If person NOT found:

Calls: create_person_record() from processing_push_funtions/people_push.py
What it does: Creates new person record in db_people and identifiers in db_people_identifiers
Sets status_person = "created"

If person found:

Sets status_person = "found"

7. Determine Lead Status (if person_id and company_id exist)

If lead NOT found:

Calls: create_lead_record() from processing_push_funtions/lead_push.py
What it does: Creates new lead record in db_leads connecting person to company, creates email identifiers in db_leads_identifiers
Sets status_lead = "created"

If lead found:

Sets status_lead = "found"

8. Determine People Workspace Status (if workspace_id and person_id exist)

If people workspace NOT found:

Calls: create_people_workspace_connection() from processing_push_funtions/people_workspace_push.py
What it does: Creates workspace connection in db_people_workspace
Sets status_people_workspace = "created"

If people workspace found:

Sets status_people_workspace = "found"

9. Determine Lead Workspace Status (if workspace_id and lead_id exist)

If lead workspace NOT found:

Calls: create_lead_workspace_connection() from processing_push_funtions/lead_workspace_push.py
What it does: Creates workspace connection in db_leads_workspace
Sets status_lead_workspace = "created"

If lead workspace found:

Sets status_lead_workspace = "found"

10. Return Response

Returns: {person_id, lead_id, people_workspace_id, lead_workspace_id, status_person, status_lead, status_people_workspace, status_lead_workspace, error}

All status values: "found" or "created"

Error Handling in API File

The process_contact_push() function handles errors at multiple stages:

Validation:

If no data provided: Returns error "No data provided"

Lookup Error:

If lookup_contact() returns error: Returns all null IDs with error

Push Errors (cascading returns):

If people_push fails: Returns error immediately
If lead_push fails: Returns person_id and status_person, but error for lead
If people_workspace_push fails: Returns person_id, lead_id, and statuses, but error for people workspace
If lead_workspace_push fails: Returns all IDs and statuses except lead workspace

Called Functions (Brief Description):

clean_contact_fields(): Normalizes contact data
lookup_contact(): Searches database for contact using two-phase lookup
create_person_record(): Creates person in db_people with identifiers
create_lead_record(): Creates lead in db_leads with email identifiers
create_people_workspace_connection(): Creates workspace link for person
create_lead_workspace_connection(): Creates workspace link for lead

POST

/company_push_patch

Purpose: Look up company and CREATE if not found OR UPDATE if found. Same for workspace connections.

Logic Flow

1. Receive Request Data

Accepts: All company fields, workspace_id

2. Set Connection Pools

Sets connection pool for: company_lookup, company_push, company_workspace_push, company_update, company_workspace_update

3. Clean the Data

Calls: clean_company_fields()

What it does: Normalizes company data

4. Lookup Company

Calls: lookup_company()

NO CACHING - Always fresh lookup

5. Determine Company Status and Action

If company NOT found:

Calls: push_company() from processing_push_funtions/company_push.py
What it does: Creates new company record
Sets status_company = "created"

If company found:

Calls: update_company_identifiers() from processing_update_funtions/company_update.py
What it does: Updates existing company fields and adds new identifiers (non-destructive)
Sets status_company = "updated"

6. Determine Workspace Status and Action (if workspace_id and company_main_id exist)

If workspace connection NOT found:

Calls: push_company_workspace() from processing_push_funtions/company_workspace_push.py
What it does: Creates workspace connection
Sets status_company_workspace = "created"

If workspace connection found:

Calls: update_company_workspace() from processing_update_funtions/company_workspace_update.py
What it does: Updates workspace connection fields (non-destructive)
Sets status_company_workspace = "updated"

7. Return Response

Returns: {company_main_id, company_workspace_id, status_company, status_company_workspace, error}

status_company: "created" or "updated"
status_company_workspace: "created" or "updated"

Error Handling in API File

The process_company_push_patch() function handles errors at multiple stages:

Validation:

If no data provided: Returns error "No data provided"

Lookup Error:

If lookup_company() returns error: Returns error from lookup

Push/Update Errors:

If company_push() fails: Returns error "company_push failed"
If company_update() fails: Returns company_main_id but error for update
If company_workspace_push() fails: Returns company info but error for workspace
If company_workspace_update() fails: Returns company info but error for workspace update

Called Functions (Brief Description):

clean_company_fields(): Normalizes company data
lookup_company(): Searches database for company
push_company(): Creates new company with identifiers
update_company_identifiers(): Updates company fields and adds new identifiers without removing existing data
push_company_workspace(): Creates workspace connection
update_company_workspace(): Updates workspace fields without removing existing data

POST

/contact_push_patch

Purpose: Look up contact and CREATE if not found OR UPDATE if found for person, lead, and workspace connections.

Logic Flow

1. Receive Request Data

Accepts: All contact fields, company_id, workspace_id

2. Set Connection Pools

Sets connection pool for: contact_lookup, people_push, lead_push, people_workspace_push, lead_workspace_push, people_update, lead_update, people_workspace_update, lead_workspace_update

3. Extract IDs and Clean Data

Extracts company_id and workspace_id

Calls: clean_contact_fields()

4. Lookup Contact

Calls: lookup_contact()

NO CACHING

5. Determine Person Status and Action

If person NOT found:

Calls: create_person_record() from processing_push_funtions/people_push.py
Sets status_person = "created"

If person found:

Calls: update_person_identifiers() from processing_update_funtions/people_update.py
What it does: Updates person fields and adds new identifiers (non-destructive)
Sets status_person = "updated"

6. Determine Lead Status and Action (if person_id and company_id exist)

If lead NOT found:

Calls: create_lead_record() from processing_push_funtions/lead_push.py
Sets status_lead = "created"

If lead found:

Calls: update_lead_identifiers() from processing_update_funtions/lead_update.py
What it does: Updates lead fields and adds new email identifiers (non-destructive)
Sets status_lead = "updated"

7. Determine People Workspace Status and Action (if workspace_id and person_id exist)

If people workspace NOT found:

Calls: create_people_workspace_connection() from processing_push_funtions/people_workspace_push.py
Sets status_people_workspace = "created"

If people workspace found:

Calls: update_people_workspace() from processing_update_funtions/people_workspace_update.py
What it does: Updates workspace fields (non-destructive)
Sets status_people_workspace = "updated"

8. Determine Lead Workspace Status and Action (if workspace_id and lead_id exist)

If lead workspace NOT found:

Calls: create_lead_workspace_connection() from processing_push_funtions/lead_workspace_push.py
Sets status_lead_workspace = "created"

If lead workspace found:

Calls: update_lead_workspace() from processing_update_funtions/lead_workspace_update.py
What it does: Updates workspace qualification status
Sets status_lead_workspace = "updated"

9. Return Response

Returns: {person_id, lead_id, people_workspace_id, lead_workspace_id, status_person, status_lead, status_people_workspace, status_lead_workspace, error}

All status values: "created" or "updated"

Error Handling in API File

The process_contact_push_patch() function handles errors at multiple stages:

Validation:

If no data provided: Returns error "No data provided"

Lookup Error:

If lookup_contact() returns error: Returns all null IDs with error

Push/Update Errors (cascading returns):

If people_push/update fails: Returns error immediately
If lead_push/update fails: Returns person info but error for lead
If people_workspace_push/update fails: Returns person and lead info but error for people workspace
If lead_workspace_push/update fails: Returns all info except lead workspace

Called Functions (Brief Description):

clean_contact_fields(): Normalizes contact data
lookup_contact(): Searches database for contact
create_person_record(): Creates person with identifiers
update_person_identifiers(): Updates person fields and adds new identifiers
create_lead_record(): Creates lead with email identifiers
update_lead_identifiers(): Updates lead fields and adds new email identifiers
create_people_workspace_connection(): Creates workspace link for person
update_people_workspace(): Updates people workspace fields
create_lead_workspace_connection(): Creates workspace link for lead
update_lead_workspace(): Updates lead workspace qualification status

POST

/company_delete_fields

Purpose: Delete specific fields from company records in db_companies_main, db_companies_dt_identifiers, and db_companies_workspace.

Logic Flow

1. Receive Request Data

Required: company_id
Optional: Fields to delete (each field as boolean true or with value)

2. Validate Data

Checks that company_id is provided
Returns error if company_id missing

3. Get Database Connection

Gets connection from pool
Creates cursor for queries

4. Part 1: Handle Boolean Fields

For fields marked as true in request
Sets fields to NULL in db_companies_main
Fields: company_name_cleaned, legal_form, address fields, registration numbers, employee counts, etc.
Executes: UPDATE db_companies_main SET field = NULL WHERE id = company_id

5. Part 2: Handle Array Fields

For array fields with values (company_tags, company_sources)
Parses comma-separated values
Removes each item from array
Executes: UPDATE db_companies_main SET field = array_remove(field, item) WHERE id = company_id

6. Part 3: Handle Identifier Fields

For identifier fields with values (name, domain, linkedin, emails, phones, social media)
Deletes matching records from db_companies_dt_identifiers
Executes: DELETE FROM db_companies_dt_identifiers WHERE companies_main_id = company_id AND identifier = value AND type = type

7. Part 4: Handle Workspace Fields (if company_workspace_id provided)

For company_qualified field: Sets to NULL
For company_custom_tags_ws: Removes items from array
Executes on db_companies_workspace table

8. Commit and Return

Commits all changes
Returns success with list of operations performed

Error Handling in API File

The process_company_delete_fields() function uses try/except with rollback:

Try Block:

Validates company_id is provided
Gets database connection from pool
Executes multiple UPDATE and DELETE queries
Tracks operations performed
Commits transaction

Except Block - Database Errors:

Catches psycopg2.Error (database-specific errors)
Rolls back transaction if error occurs
Closes cursor if open
Releases connection back to pool
Returns: (False, "Database error: [error details]")

Except Block - General Errors:

Catches any other Exception
Rolls back transaction if error occurs
Closes cursor if open
Releases connection back to pool
Returns: (False, "Error: [error details]")

Finally Logic (implicit in except blocks):

Always attempts to close cursor
Always attempts to rollback on error
Always releases connection back to pool

No Called Processing Functions: This file directly executes database queries without calling separate processing modules.

POST

/contact_delete_fields

Purpose: Delete specific fields from contact records in db_people, db_leads, db_people_identifiers, db_leads_identifiers, and workspace tables.

Logic Flow

1. Receive Request Data

Required: At least one of people_id or lead_id
Optional: people_workspace_id, leads_workspace_id, fields to delete

2. Validate Data

Checks that at least people_id OR lead_id is provided
Returns error if both missing

3. Get Database Connection

Gets connection from pool
Creates cursor for queries

4. Part 1: Handle People Boolean Fields (if people_id provided)

For fields marked as true in request
Sets fields to NULL in db_people
Fields: name, gender, language, birth info, location, LinkedIn data, etc.
Executes: UPDATE db_people SET field = NULL WHERE id = people_id

5. Part 2: Handle Lead Boolean Fields (if lead_id provided)

For fields marked as true in request
Sets fields to NULL in db_leads
Fields: position, seniority, department, dates, summary, etc.
Executes: UPDATE db_leads SET field = NULL WHERE id = lead_id

6. Part 3: Handle Lead Array Fields (if lead_id provided)

For lead_sources array field
Parses comma-separated values
Removes each item from array
Executes: UPDATE db_leads SET lead_sources = array_remove(lead_sources, item) WHERE id = lead_id

7. Part 4: Handle People Identifiers (if people_id provided)

For identifier fields with values (contact_linkedin, contact_xing)
Deletes matching records from db_people_identifiers
Executes: DELETE FROM db_people_identifiers WHERE people_id = people_id AND contact_ident_identifier = value AND contact_ident_type = type

8. Part 5: Handle Lead Identifiers (if lead_id provided)

For email identifier fields with values (valid, catch_all, invalid, unsure)
Deletes matching records from db_leads_identifiers
Executes: DELETE FROM db_leads_identifiers WHERE leads_id = lead_id AND lead_ident_identifier = email AND lead_ident_type = 'email' AND lead_ident_status = status

9. Part 6: Handle People Workspace (if people_workspace_id provided)

Note: db_people_workspace only contains IDs, no other fields to delete
Logs operation performed

10. Part 7: Handle Leads Workspace (if leads_workspace_id provided)

For lead_qualified_ws field: Sets to NULL
Executes: UPDATE db_leads_workspace SET lead_qualified_ws = NULL WHERE id = leads_workspace_id

11. Commit and Return

Commits all changes
Returns success with list of operations performed

Error Handling in API File

The process_contact_delete_fields() function uses try/except with rollback:

Try Block:

Validates at least one ID is provided
Gets database connection from pool
Executes multiple UPDATE and DELETE queries across multiple tables
Tracks operations performed
Commits transaction

Except Block - Database Errors:

Catches psycopg2.Error (database-specific errors)
Rolls back transaction if error occurs
Closes cursor if open
Releases connection back to pool
Returns: (False, "Database error: [error details]")

Except Block - General Errors:

Catches any other Exception
Rolls back transaction if error occurs
Closes cursor if open
Releases connection back to pool
Returns: (False, "Error: [error details]")

Finally Logic (implicit in except blocks):

Always attempts to close cursor
Always attempts to rollback on error
Always releases connection back to pool

No Called Processing Functions: This file directly executes database queries without calling separate processing modules.

Key Differences Between Endpoints

Push vs Push/Patch

Push Endpoints

Found: Returns "found" status, NO update
Not Found: Creates new record, returns "created" status

Push/Patch Endpoints

Found: Updates existing record, returns "updated" status
Not Found: Creates new record, returns "created" status

Update Behavior (Push/Patch only)

All update functions are non-destructive:

Existing fields: Keep current values
New fields: Add new values
Arrays: Append new items (don't remove existing)
Identifiers: Add new identifiers (don't remove existing)

Delete Behavior

Delete endpoints are selective:

Boolean fields with value true: Set to NULL
Array fields with values: Remove only specified items
Identifier fields with values: Delete only specified identifiers
Uses explicit DELETE queries for identifier tables

Error Handling Patterns

Push/Push-Patch Files

Pattern: Cascading returns with partial success

If early operation fails: Return error immediately
If later operation fails: Return successful IDs but error for failed operation

Example:

If person created but lead fails, return person_id and status_person, but error for lead

Delete Files

Pattern: All-or-nothing transaction with rollback

If any operation fails: Rollback entire transaction
Return success only if all operations commit
Uses explicit transaction management

Connection Management

All Files:

Get connection from shared pool
Always release connection back to pool (even on error)
Use try/except/finally pattern for cleanup

Summary: Lookup vs Get

Lookup Endpoints (Search)

Purpose: Find if something exists
Input: Identifiers (name, domain, email, etc.)
Output: IDs only
Use Case: "Does this company/contact exist in our database?"
Caching: Yes (1 hour)
Returns null when not found: This is normal, not an error

Get Endpoints (Retrieve)

Purpose: Get complete information
Input: ID (company_id or lead_id)
Output: All data fields, all identifiers, workspace connections
Use Case: "Give me everything you know about this company/contact"
Caching: No
Returns error when not found: ID doesn't exist in database

Typical Workflows

Workflow 1: Check if company exists, then get details

1. POST /company_lookup with domain="anthropic.com"
2. Response: {company_main_id: "uuid-123", ...}
3. GET /company_get?company_id=uuid-123
4. Response: {complete company data...}

Workflow 2: Search for contact, then get full info

1. POST /contact_lookup with email="john@company.com" and company_id="company-uuid"
2. Response: {lead_id: "lead-456", person_id: "person-789", ...}
3. GET /contact_get?lead_id=lead-456
4. Response: {complete contact data including lead and person info...}

Workflow 3: Check workspace membership

1. POST /company_lookup with domain="example.com" and workspace_id="ws-123"
2. Response: {company_main_id: "uuid-123", company_workspace_id: null, ...}
3. Interpretation: Company exists but is NOT in workspace ws-123
4. Next step: Use company_push to add it to the workspace

Common Questions

Q: When should I use workspace_id?

Use workspace_id in lookups when:

You want to check if a company/contact is already in a specific workspace
You're building workspace-specific features (like workspace dashboards)

Use workspace_id in get calls when:

You want workspace-specific data (qualification status, custom tags)
You're displaying data in a workspace context

Don't use workspace_id when:

You're doing a global search across all workspaces
You want to find all instances of a company/contact regardless of workspace

Q: Why does contact lookup return person_id without lead_id sometimes?

This happens in Phase 2 of contact lookup:

You searched for a person (by LinkedIn/Xing) without providing company_id, OR
You provided company_id but the person doesn't work at that company
The system found the person in the database but couldn't find a lead (company connection)

This means: "We know this person exists, but we don't have employment data for the company you specified"

Q: What's the difference between company_name and company_name_cleaned?

company_name: Stored in db_companies_dt_identifiers - can have multiple variations (old names, alternate spellings)
company_name_cleaned: Stored in db_companies_main - the current, standardized company name

Example:

company_names array: ["Facebook", "Facebook Inc", "Meta"]
company_name_cleaned: "Meta Platforms Inc"

Q: Why are identifiers in separate arrays by type?

For historical tracking:

Company domains change (rebrandings, mergers)
People change names (marriage, legal name changes)
Email addresses change (job changes)

For matching flexibility:

Search with old domain, find company by current domain
Search with old email, find person by current email
Match against ANY stored identifier

Q: Can I search for a person without knowing their company?

Yes, in contact_lookup:

Don't provide company_id
Provide LinkedIn or Xing URL
The lookup will run Phase 2 (person-only search)
You'll get person_id but not lead_id

This means: "Found the person, but don't know where they work"

Q: What happens if I provide multiple identifiers that match different companies?

The system returns the first match found:

Example: domain="company-a.com", linkedin="linkedin.com/company/company-b"
If company A is found first in database, it returns company A
No ranking or scoring is performed

Best practice: Use the most specific/reliable identifier you have

Database Tables Reference

Company Tables

v_company_lookup

Optimized view for searching companies (contains arrays of identifiers)

db_companies_main

Main company data (single record per company)

db_companies_dt_identifiers

Company identifiers (multiple records per company, one per identifier)

db_companies_workspace

Workspace connections (one record per company-workspace pair)

Contact Tables

v_contact_lookup

Optimized view for searching contacts (contains arrays of identifiers)

db_leads

Lead data - person at company (one record per person-company relationship)

db_people

Person data - individual information (single record per person)

db_people_identifiers

Person identifiers like LinkedIn/Xing (multiple records per person)

db_leads_identifiers

Lead identifiers like emails (multiple records per lead)

db_leads_workspace

Lead workspace connections (one record per lead-workspace pair)

db_people_workspace

Person workspace connections (one record per person-workspace pair)

Table Relationships

Company Structure:

db_companies_main (1) ←→ (many) db_companies_dt_identifiers
                  (1) ←→ (many) db_companies_workspace

Contact Structure:

db_people (1) ←→ (many) db_people_identifiers
          (1) ←→ (many) db_people_workspace
          (1) ←→ (many) db_leads

db_leads (1) ←→ (many) db_leads_identifiers
         (1) ←→ (many) db_leads_workspace
         (many) ←→ (1) db_companies_main

Key Points:

One person can have many leads (worked at multiple companies)
One lead belongs to one person and one company
Identifiers are stored separately for historical tracking
Workspace connections are separate for each entity

Functions

Processing functions used across API endpoints for data cleaning, validation, and normalization.

This document explains the logic flow for all processing functions organized by functional area. Each section describes what each function does, its complete logic flow, database operations, error handling, and return values.

1. Processing Cleaning Functions

1.1 Contact Field Cleaning (contact_field_cleaning.py)

Purpose:

Normalizes and validates contact/lead field data before database operations.

normalize_quotes(text: str) → str

Purpose:

Converts all Unicode quote characters to standard ASCII quotes.

Logic Flow:

1. Check if input is string type, return as-is if not
2. Replace all smart quotes and quote-like characters:
- • Right single quotation mark (U+2019) → apostrophe
- • Left single quotation mark (U+2018) → apostrophe
- • Grave accent/backtick (U+0060) → apostrophe
- • Acute accent (U+00B4) → apostrophe
- • Left double quotation mark (U+201C) → double quote
- • Right double quotation mark (U+201D) → double quote
3. Return normalized text

Database Operations:

None

Return Value:

Normalized string with standard ASCII quotes

clean_contact_fields(contact_data: dict) → dict

Purpose:

Validates and normalizes all contact fields including social media URLs and removes empty values.

Logic Flow:

1. Normalize Quotes in All String Fields - Iterate through all fields and call normalize_quotes() for each string value
2. Clean LinkedIn URL - Extract person slug, validate format, format as https://www.linkedin.com/in/{slug}
3. Clean Xing URL - Extract person slug, validate format, format as https://www.xing.com/people/{slug}
4. Remove Empty Fields - Delete all fields with None, empty string, or blank values

Example:

Input:

{"contact_linkedin": "https://linkedin.com/in/john-doe?param=123", "contact_email_valid": "john@example.com", "contact_xing": ""}

Output:{"contact_linkedin": "https://www.linkedin.com/in/john-doe", "contact_email_valid": "john@example.com"}

1.2 Company Field Cleaning (company_field_cleaning.py)

Purpose:

Normalizes and validates company field data including domains and social media URLs.

clean_company_fields(company_data: dict) → dict

Purpose:

Validates and normalizes all company fields including domains and social media URLs.

Logic Flow:

1. Normalize and Strip All String Fields
2. Clean LinkedIn URL - Extract company slug, format as https://www.linkedin.com/company/{slug}
3. Clean Domain - Remove prefixes (https://, www.), store only base domain
4. Clean Xing URL - Format as https://www.xing.com/pages/{slug}
5. Clean Instagram URL - Format as https://www.instagram.com/{slug}
6. Clean Facebook URL - Format as https://www.facebook.com/{slug}
7. Clean Pinterest URL - Format as https://de.pinterest.com/{slug}
8. Clean TikTok URL - Format as https://www.tiktok.com/{slug}
9. Clean YouTube URL - Format as https://www.youtube.com/{slug}
10. Clean Twitter URL - Format as https://x.com/{slug}
11. Remove Empty Fields

Note: Try/except blocks around URL parsing for each platform. Malformed URLs are set to None.

Field Cleaning Detailed Documentation: For comprehensive field cleaning algorithms and platform-specific rules, see the Field Cleaning section earlier in this documentation.

2. Processing CSV Functions

2.1 Fetch CSV Fields (fetch_csv_fields.py)

Purpose:

Fetches and parses specific rows from CSV files stored in Supabase Storage with header mapping.

Connection Management:

• _shared_pool: Global variable set by main script
• set_connection_pool(pool): Sets the shared pool
• get_pg_connection(): Gets connection from pool
• release_pg_connection(conn): Returns connection to pool

fetch_csv_batch(file_name, start_row, end_row, csv_header_rows)

Logic Flow:

1. Construct File Path - Build path: csv_uploads/{file_name}.csv
2. Check File Exists - Query storage.objects table
3. Download CSV File - HTTP GET from Supabase Storage
4. Parse CSV Content - Create CSV reader from StringIO
5. Validate Row Indices - Check bounds and adjust end_row
6. Extract Batch Rows - Map column values to header names if mapping provided
7. Return Batch Result - Include file_name, total_rows, range, and row data

Database Operations:

• Table: storage.objects
• Query: SELECT to check file existence
• Connection: Uses shared pool, released after query

Error Scenarios:

• Database Error: Returns error message, releases connection
• File Not Found: Returns error "File not found in storage"
• HTTP Error: Returns error with status code
• Invalid Row Index: Returns error with valid range

2.2 Fetch Upload Job (fetch_upload_job.py)

Purpose:

Retrieves upload job metadata from ev_upload_jobs table with fallback to HTTP API.

Heroku Compatibility: Supports temporary connections before pool initialization. Falls back to HTTP API if PostgreSQL fails.

fetch_upload_job_by_id(job_id: str)

Fetches upload job data using PostgreSQL first, HTTP API as fallback.

Logic Flow:

1. Try PostgreSQL fetch via _fetch_via_postgres()
2. If fails, try HTTP fetch via _fetch_via_http()
3. Return result or None if both methods fail

_fetch_via_postgres(job_id: str)

Logic Flow:

1. Get connection from pool with RealDictCursor
2. Query ev_upload_jobs by ID (id, workspace_id, user_id, data_type, job_type, mapped_csv_fields, csv_header_rows, etc.)
3. Filter csv_header_rows to keep only the row at csv_header_row_position
4. Check CSV file existence via _check_csv_exists()
5. Reorder result dictionary (core fields first, metadata last)
6. Release connection and return ordered result

Database Operations:

Table: ev_upload_jobs, storage.objects | Query: SELECT by ID | Connection: Uses shared pool

2.3 Update Progress (update_progress.py)

Purpose:

Updates job status and progress data in ev_upload_jobs table.

update_job_status(job_id, job_status)

Updates the job_status field in ev_upload_jobs.

Query:

UPDATE ev_upload_jobs SET job_status = %s WHERE id = %s

Return: Tuple (success: bool, error_message: str or None)

update_job_progress(job_id, progress_data)

Updates the progress_data JSONB field in ev_upload_jobs.

Query:

UPDATE ev_upload_jobs SET progress_data = %s WHERE id = %s

Return: Tuple (success: bool, error_message: str or None)

2.4 Error CSV Push (error_csv_push.py)

Purpose:

Formats and uploads error CSV files to Supabase Storage csv_wrong folder.

push_error_csv(all_csv_errors, csv_file_name)

Main function to format, check, delete old, and upload error CSV.

Logic Flow:

1. Initialize Result Dictionary
2. Format Errors to CSV - Call format_errors_to_csv() with row_index, errors as first columns
3. Check if File Exists - Query storage.objects
4. Delete Existing File - If found, delete via _delete_error_csv()
5. Upload New Error CSV - POST to Supabase Storage
6. Return Result - Complete status of all operations

Return Value:

{error_file_found: bool, error_file_deleted: str|None, file_created: bool, file_name: str, message: str}

3. Processing Lookup Functions

3.1 Contact Lookup (contact_lookup.py)

Purpose:

Searches for existing contacts/leads in v_contact_lookup view with two-phase lookup strategy.

lookup_contact(contact_data, company_id, workspace_id)

Two-phase lookup for contacts - first with company context, then without.

Phase 1: Lead Lookup (with company context)

• Email - Check all email types (valid, catch_all, invalid, unsure) against email_array
• LinkedIn + company_id - Check linkedin_array with lead_companies_main_id match
• Xing + company_id - Check xing_array with lead_companies_main_id match
• Name + company_id - Check first_name/last_name variations with company match

Phase 2: Person Lookup (fallback without company)

• LinkedIn alone - Check linkedin_array without company constraint
• Xing alone - Check xing_array without company constraint

Match Strategy: Phase 1 prioritizes lead matches (person at specific company). Phase 2 falls back to person matches (any company). Email has highest priority across all companies.

Return Value:

Tuple (lead_id, person_id, lead_workspace_id, people_workspace_id, lookup_result, error_message)

• Phase 1 success: Returns lead_id and person_id

• Phase 2 success: Returns person_id only (lead_id is None)

• Not found: Returns all None (not an error)

• Error: Returns all None with error message

Database Operations:

• View: v_contact_lookup (main search)
• Tables: db_leads_workspace, db_people_workspace (workspace lookups)
• Queries: SELECT with parameterized OR conditions
• Connection: Uses shared pool, always released

3.2 Company Lookup (company_lookup.py)

Purpose:

Searches for existing companies in v_company_lookup view.

lookup_company(company_name, company_domain, company_linkedin, workspace_id)

Single-phase lookup for companies using ANY identifier match.

Lookup Conditions (OR logic):

• Name: company_name_array match
• Domain: company_domain_array match
• LinkedIn: company_linkedin_array match

Match Strategy: Uses OR logic (any identifier matches). Returns first match (LIMIT 1). No ranking or priority between identifiers.

Return Value:

Tuple (company_main_id, company_workspace_id, error_message)

• Found: Returns company_main_id (and company_workspace_id if workspace_id provided)

• Not found: Returns (None, None, None)

• Error: Returns (None, None, error_message)

Database Operations:

• View: v_company_lookup (main search)
• Table: db_companies_workspace (workspace lookup)
• Queries: SELECT with LIMIT 1 (returns first match)
• Connection: Uses shared pool, always released

4. Processing Push Functions

4.1 People Push (people_push.py)

Purpose:

Creates new person records in db_people and identifiers in db_people_identifiers.

create_person_record(contact_data, http_client)

Logic Flow:

1. Map Fields for Insertion - Extract non-null values for names, demographics, birth, location, LinkedIn, career, etc.
2. Validate Data - Return error if no fields to insert
3. Build INSERT Query - Parameterized query with RETURNING *
4. Execute INSERT - Insert into db_people, commit, extract person_id
5. Track Identifiers - Collect contact_linkedin → 'linkedin', contact_xing → 'xing'
6. Batch Insert Identifiers - Use executemany for db_people_identifiers
7. Return - (person_id, None) on success

Database Operations:

• Tables: db_people (main), db_people_identifiers (identifiers)
• Queries: INSERT with RETURNING, Batch INSERT for identifiers
• Transaction: Committed after each operation

4.2 Lead Push (lead_push.py)

Purpose:

Creates new lead records in db_leads connecting person to company, with email identifiers.

create_lead_record(contact_data, person_id, company_main_id, http_client)

Logic Flow:

1. Map Regular Fields - Position, classification, timeline, summary
2. Map Array Fields - lead_sources (convert string to array if needed)
3. Build INSERT Query - Include people_id and companies_main_id references
4. Execute INSERT - Insert into db_leads, commit, extract lead_id
5. Track Email Identifiers - For each email field, determine status (valid, catch_all, invalid, unsure)
6. Batch Insert Emails - Use executemany for db_leads_identifiers with status
7. Return - (lead_id, None) on success

Database Operations:

• Tables: db_leads (main), db_leads_identifiers (emails)
• Queries: INSERT with RETURNING, Batch INSERT for emails
• Transaction: Committed after each operation

4.3 People Workspace Push (people_workspace_push.py)

Purpose:

Creates workspace connections for people in db_people_workspace.

create_people_workspace_connection(people_id, workspace_id, http_client)

Creates people-workspace connection record.

Query:

INSERT INTO db_people_workspace (people_id, workspace_id) VALUES (%s, %s) RETURNING id

Return: Tuple (people_workspace_id, error_message)

4.4 Lead Workspace Push (lead_workspace_push.py)

Purpose:

Creates workspace connections for leads in db_leads_workspace with optional qualification.

create_lead_workspace_connection(lead_id, workspace_id, lead_qualified_ws, http_client)

Creates lead-workspace connection with optional qualification status.

Logic:

• Always include: leads_id, workspace_id
• If lead_qualified_ws provided: Add to fields and params
• Build dynamic INSERT query with RETURNING id

Return: Tuple (lead_workspace_id, error_message)

4.5 Company Workspace Push (company_workspace_push.py)

Purpose:

Creates workspace connections for companies in db_companies_workspace.

push_company_workspace(company_data, workspace_id, company_main_id)

Creates company-workspace connection with optional qualification and tags.

Fields:

• Always: workspaces_id, companies_main_id
• Optional: company_qualified (boolean)
• Optional: company_custom_tags_ws (text[] - converted from comma-separated string)

Return: Tuple (company_workspace_id, error_message)

4.6 Company Push (company_push.py)

Purpose:

Creates new company records in db_companies_main and identifiers in db_companies_dt_identifiers.

push_company(company_data)

Logic Flow:

1. Define Identifier Mappings - name, domain, linkedin, email, phone, social media
2. Map Main Table Fields - Company details, address, registration, metrics, media
3. Map Array Fields - company_tags (text[]), company_sources (data_sources[])
4. Build INSERT Query - If main fields exist: INSERT with casts (%s::text[], %s::data_sources[]). If only identifiers: INSERT DEFAULT VALUES
5. Execute Main Insert - Insert into db_companies_main, commit, extract company_id
6. Collect and Insert Identifiers - Batch INSERT for all identifier types
7. Return - (company_id, None) on success

Identifier Types Supported:

• name
• domain
• linkedin
• email
• phone

• instagram
• facebook
• xing
• pinterest
• tiktok, youtube, twitter

Database Operations:

• Tables: db_companies_main (main), db_companies_dt_identifiers (identifiers)
• Queries: INSERT with RETURNING (or DEFAULT VALUES), Batch INSERT for identifiers
• Transaction: Committed after each operation

5. Processing Update Functions

Non-Destructive Update Strategy: All update functions use additive operations. Regular fields are updated only if new value provided. Array fields append new items using array_cat() or || operator. Identifiers are only inserted (never deleted). Existing data is always preserved.

5.1 People Update (people_update.py)

Purpose:

Updates existing person records and adds new identifiers (non-destructive).

update_person_identifiers(contact_data, person_id)

Logic Flow:

1. Map Fields for Update - Same field mappings as people_push
2. Build UPDATE Query - If fields exist: UPDATE db_people SET field1 = %s, ... WHERE id = %s
3. Fetch Existing Identifiers - SELECT from db_people_identifiers
4. Identify New Identifiers - Check if (identifier, type) NOT in existing set
5. Batch Insert New Identifiers - Use executemany
6. Return - (fields_updated: bool, identifiers_added: int, error_message)

Non-Destructive: Keeps current field values if new value not provided. Only adds new identifiers, never deletes existing ones.

5.2 Lead Update (lead_update.py)

Purpose:

Updates existing lead records and adds new email identifiers (non-destructive).

update_lead_identifiers(contact_data, lead_id)

Logic Flow:

1. Fetch Existing Record - Get lead_sources array
2. Map Regular Fields - Same as lead_push
3. Handle Array Field - Find new items not in existing lead_sources, use array_cat() to append
4. Execute UPDATE - Update fields and append to arrays
5. Fetch Existing Email Identifiers - Build dict of (identifier, type) → status
6. Process Email Identifiers - New emails → inserts list, Changed status → updates list
7. Batch Operations - INSERT new emails, UPDATE changed statuses
8. Return - (fields_updated: bool, identifiers_added: int, error_message)

Non-Destructive: Regular fields keep current values if new not provided. Arrays append new items. Email identifiers are added or status updated (never deleted).

5.3 Lead Workspace Update (lead_workspace_update.py)

Purpose:

Updates lead workspace connection fields in db_leads_workspace.

update_lead_workspace(lead_workspace_id, lead_qualified_ws)

Updates qualification status for lead workspace connection.

Query:

UPDATE db_leads_workspace SET lead_qualified_ws = %s WHERE id = %s

Return: Tuple (workspace_updated: bool, error_message)

5.4 People Workspace Update (people_workspace_update.py)

Purpose:

Updates people workspace connection fields (currently no updateable fields).

update_people_workspace(contact_data, people_workspace_id)

Placeholder for future workspace field updates. Currently returns success immediately.

Note: db_people_workspace currently only contains (id, people_id, workspace_id). No updateable fields exist. Function exists for consistency and future extensibility.

Return: Tuple (True, None)

5.5 Company Update (company_update.py)

Purpose:

Updates existing company records and adds new identifiers (non-destructive).

update_company_identifiers(company_data, company_main_id)

Logic Flow:

1. Fetch Existing Record - Get company_tags and company_sources arrays
2. Map Regular Fields - Same as company_push
3. Handle Array Fields - Find new items, use || operator with explicit cast (company_tags || %s::text[], company_sources || %s::data_sources[])
4. Execute UPDATE - Update fields and append to arrays
5. Fetch Existing Identifiers - Build set of (identifier, type) tuples
6. Identify New Identifiers - Check all identifier mappings against existing set
7. Batch Insert New Identifiers - Use executemany
8. Return - (fields_updated: bool, identifiers_added: int, error_message)

Non-Destructive: Regular fields keep current values. Arrays append new items using || operator with cast. Identifiers are only added, never deleted.

5.6 Company Workspace Update (company_workspace_update.py)

Purpose:

Updates company workspace connection fields (qualification and tags).

update_company_workspace(company_data, company_workspace_id)

Updates workspace qualification and appends custom tags (non-destructive).

Logic:

1. Fetch existing company_custom_tags_ws array
2. If company_qualified provided: Add to SET clause
3. If company_custom_tags_ws provided: Find new tags, append using || operator with ::text[] cast
4. Execute UPDATE if fields exist
5. Return (workspace_updated: bool, error_message)

Non-Destructive: company_qualified updates to new value. company_custom_tags_ws appends new tags without removing existing ones.

Summary of Patterns and Best Practices

Connection Management Pattern

All functions follow this pattern:

conn = None
cursor = None
try:
    conn = get_pg_connection()
    cursor = conn.cursor(cursor_factory=RealDictCursor)
    # ... database operations ...
    conn.commit()
    cursor.close()
    release_pg_connection(conn)
    return (result, None)
except psycopg2.Error as e:
    if conn:
        conn.rollback()
        release_pg_connection(conn)
    return (None, error_message)
finally:
    # Cleanup in except blocks, not finally

Non-Destructive Update Strategy

All update functions use additive operations:

• Regular fields: Update only if new value provided
• Array fields: Append new items using array_cat() or || operator
• Identifiers: Only insert new ones, never delete existing
• Example: If existing tags = ["tag1"], input tags = ["tag2"], result = ["tag1", "tag2"]

Error Handling Strategy

Cleaning Functions:

• Try/except around URL parsing
• Return None for invalid data
• Never raise exceptions

Lookup Functions:

• Return None for not found (not an error)
• Return error message for database failures
• Always release connections

Push/Update Functions:

• Rollback transactions on error
• Return tuple (result, error_message)
• Always release connections
• Cascade returns (partial success allowed)

Parameterized Queries

All database queries use parameterized placeholders:

• Prevents SQL injection
• Handles special characters
• Example: WHERE id = %s, params: (id_value,)

Batch Operations

Use executemany for multiple inserts/updates:

• More efficient than multiple execute() calls
• Used for identifiers (people, leads, companies)
• Used for email identifiers with status

cursor.executemany(query, [(data1,), (data2,), ...])

API Documentation

Authentication

Quick Start

Base URL

Companies

GET/company_get

Query Parameters

Response Parameters

POST/company_lookup

Request Body

Response Parameters

POST/company_push

Request Body

Response Parameters

POST/company_push_patch

Request Body

Response Parameters

POST/company_delete_fields

Request Body

Response Parameters

People

GET/contact_get

Query Parameters

Response Parameters

POST/contact_lookup

Request Body

Response Parameters

POST/contact_push

Request Body

Response Parameters

POST/contact_push_patch

Request Body

Response Parameters

POST/contact_delete_fields

Request Body

Response Parameters

Field Cleaning

Overview

When Field Cleaning is Applied

Purpose

Pre-Processing Stage (Applied to ALL Fields)

1. Whitespace Stripping

2. Quote Normalization (normalize_quotes)

3. Empty Field Removal

Order of Operations

Company Field Cleaning

1. Domain Cleaning (clean_domain)

2. LinkedIn URL Cleaning (clean_linkedin_url)

3. Email Cleaning

4. Phone Number Cleaning

5. Social Media URL Cleaning

Detailed Platform Algorithms

Pinterest (clean_pinterest)

TikTok (clean_tiktok)

YouTube (clean_youtube)

Twitter/X (clean_twitter)

6. Company Name Cleaning

7-11. Other Company Fields

Contact Field Cleaning

Pre-Processing Stage (Applied to ALL Fields)

1. Email Cleaning (clean_email)

2. LinkedIn Profile Cleaning (clean_linkedin_profile)

Name Cleaning

Field Normalization

Validation and Null Replacement Rules

All Social Media & Profile URL Cleaning

Strict Platform-Specific Requirements

Key Insights

Cleaning Impact on Matching

Why Cleaning Matters for Deduplication

Company Lookup Priority

Contact Lookup Priority

Data Quality Benefits

Application Scope

✓ Field Cleaning is Applied In:

✗ NOT Applied In:

Order of Execution

Summary

API Logic

Rate Limits

Pinterest (`clean_pinterest`)

TikTok (`clean_tiktok`)

YouTube (`clean_youtube`)

Twitter/X (`clean_twitter`)

1. Email Cleaning (`clean_email`)

2. LinkedIn Profile Cleaning (`clean_linkedin_profile`)