API Documentation
Complete API reference for Grundwerk Digital platform integration. Build powerful integrations using our RESTful API.
Authentication
All API endpoints require authentication using an API key. Include your API key in the request header with each request.
Quick Start
api_key: {{your_api_key}}Base URL
All API requests should be made to the following base URL:
https://web-production-603a8.up.railway.appCompanies
Manage company records in your database. Create, read, update, and delete company information including business details, contact information, and related metadata.
GET/company_get
Get all companie records.
Required: The following fields are mandatory
Query Parameters
company_idrequiredThe unique identifier of the company
00000000-0000-0000-0000-000000000000workspace_idclient_dataThe workspace identifier
00000000-0000-0000-0000-000000000000curl -X GET "https://web-production-603a8.up.railway.app/company_get?company_id=00000000-0000-0000-0000-000000000000&workspace_id=00000000-0000-0000-0000-000000000000" -H "api_key: {{your_api_key}}" -H "Content-Type: application/json"
Response Parameters
company_name_cleanedstringCleaned/standardized version of the company name
Example Company GmbHcompany_legal_formstringLegal form of the company (e.g., GmbH, AG, LLC)
GmbHb2b_b2cstringBusiness model classification
B2Bcompany_name_imprintstringCompany name as it appears in official imprint/legal notices
Example Company GmbHcompany_streetstringStreet name of company address
Hauptstraßecompany_street_nrstringStreet number of company address
123company_citystringCity where company is located
Berlincompany_zipstringPostal/ZIP code
10115company_regionstringRegion/state where company is located
Berlincompany_countrystringCountry where company is located
Germanycompany_tax_nrstringTax identification number (Steuernummer)
12/345/67890company_vat_nrstringVAT identification number (Umsatzsteuer-ID)
DE123456789company_handels_registerstringCommercial register number (Handelsregisternummer)
HRB 12345company_employees_researchstringEmployee count from research
50-100company_employees_linkedinstringEmployee count from LinkedIn
75company_founded_yearstringYear the company was founded
2010company_descriptionstringCompany description/about text
Leading provider of software solutionscompany_logo_urlstringURL to company logo image
https://example.com/logo.pngcompany_size_linkedinstringCompany size range from LinkedIn
51-200 employeescompany_linkedin_followersstringNumber of LinkedIn followers
1234company_tagsarrayArray of tags/categories for the company
["tech","software","b2b"]company_sourcesarrayArray of data sources
["website","linkedin","research"]db_companies_main_created_atstringTimestamp when company record was created (ISO 8601 format)
2024-01-15T10:30:00Zdb_companies_main_updated_atstringTimestamp when company record was last updated (ISO 8601 format)
2024-03-20T14:45:00Zcompany_namesarrayAll company names associated with this company. Always present (empty array if none)
["Example Company","Example Co.","Example GmbH"]company_domainsarrayAll domain names associated with this company. Always present (empty array if none)
["example.com","example.de"]company_emailsarrayAll email addresses associated with this company. Always present (empty array if none)
["info@example.com","contact@example.com"]company_phonesarrayAll phone numbers associated with this company. Always present (empty array if none)
["+49301234567","+49301234568"]company_linkedinsarrayAll LinkedIn URLs associated with this company. Always present (empty array if none)
["https://linkedin.com/company/example"]company_instagramsarrayAll Instagram URLs associated with this company. Always present (empty array if none)
["https://instagram.com/example"]company_facebooksarrayAll Facebook URLs associated with this company. Always present (empty array if none)
["https://facebook.com/example"]company_xingsarrayAll Xing URLs associated with this company. Always present (empty array if none)
[]company_pinterestsarrayAll Pinterest URLs associated with this company. Always present (empty array if none)
[]company_tiktoksarrayAll TikTok URLs associated with this company. Always present (empty array if none)
[]company_youtubesarrayAll YouTube URLs associated with this company. Always present (empty array if none)
["https://youtube.com/@example"]company_twittersarrayAll Twitter/X URLs associated with this company. Always present (empty array if none)
["https://twitter.com/example"]company_workspace_iduuidUUID of the workspace connection. null if workspace_id not provided in request OR workspace connection doesn't exist
00000000-0000-0000-0000-000000000000company_qualifiedstringQualification status in workspace. null if not set or no workspace connection
yescompany_custom_tags_wsarrayCustom workspace-specific tags. null if not set or no workspace connection
["priority","enterprise"]db_companies_workspace_created_atstringTimestamp when workspace connection was created (ISO 8601 format). null if no workspace connection
2024-01-20T11:00:00Zdb_companies_workspace_updated_atstringTimestamp when workspace connection was last updated (ISO 8601 format). null if no workspace connection
2024-03-22T16:30:00Z{
"company_name_cleaned": "Example Company GmbH",
"company_legal_form": "GmbH",
"b2b_b2c": "B2B",
"company_name_imprint": "Example Company GmbH",
"company_street": "Hauptstraße",
"company_street_nr": "123",
"company_city": "Berlin",
"company_zip": "10115",
"company_region": "Berlin",
"company_country": "Germany",
"company_tax_nr": "12/345/67890",
"company_vat_nr": "DE123456789",
"company_handels_register": "HRB 12345",
"company_employees_research": "50-100",
"company_employees_linkedin": "75",
"company_founded_year": "2010",
"company_description": "Leading provider of software solutions",
"company_logo_url": "https://example.com/logo.png",
"company_size_linkedin": "51-200 employees",
"company_linkedin_followers": "1234",
"company_tags": [
"tech",
"software",
"b2b"
],
"company_sources": [
"website",
"linkedin",
"research"
],
"db_companies_main_created_at": "2024-01-15T10:30:00Z",
"db_companies_main_updated_at": "2024-03-20T14:45:00Z",
"company_names": [
"Example Company",
"Example Co.",
"Example GmbH"
],
"company_domains": [
"example.com",
"example.de"
],
"company_emails": [
"info@example.com",
"contact@example.com"
],
"company_phones": [
"+49301234567",
"+49301234568"
],
"company_linkedins": [
"https://linkedin.com/company/example"
],
"company_instagrams": [
"https://instagram.com/example"
],
"company_facebooks": [
"https://facebook.com/example"
],
"company_xings": [],
"company_pinterests": [],
"company_tiktoks": [],
"company_youtubes": [
"https://youtube.com/@example"
],
"company_twitters": [
"https://twitter.com/example"
],
"company_workspace_id": "ws-uuid-123",
"company_qualified": "yes",
"company_custom_tags_ws": [
"priority",
"enterprise"
],
"db_companies_workspace_created_at": "2024-01-20T11:00:00Z",
"db_companies_workspace_updated_at": "2024-03-22T16:30:00Z"
}POST/company_lookup
Lookup if Company exists, and push if not exists.
Required: At least one of the following conditions must be met
Request Body
workspace_idclient_dataThe workspace identifier
00000000-0000-0000-0000-000000000000company_nameat least 1 requiredThe name of the company to lookup
Acme Corporationcompany_domainat least 1 requiredThe domain of the company to lookup
acme.comcompany_linkedinat least 1 requiredThe LinkedIn URL of the company to lookup
https://linkedin.com/company/acmecurl -X POST "https://web-production-603a8.up.railway.app/company_lookup" -H "api_key: {{your_api_key}}" -H "Content-Type: application/json" -d '{ "workspace_id": "00000000-0000-0000-0000-000000000000", "company_name": "Acme Corporation", "company_domain": "acme.com", "company_linkedin": "https://linkedin.com/company/acme" }'
Response Parameters
company_main_iduuidUUID of the company in the db_companies_main table. Returns UUID if company found, null if not found.
00000000-0000-0000-0000-000000000000company_workspace_iduuidUUID of the company-workspace connection in db_companies_workspace table. Only populated if workspace_id was provided in request. Returns UUID if workspace connection exists, null if not found.
00000000-0000-0000-0000-000000000000errorstringError message if lookup failed. null if no error occurred.
{
"company_main_id": "283f9e1f-89dd-4032-8e02-65acc6856ed1",
"company_workspace_id": "abc-123-def-456",
"error": null
}POST/company_push
Push new company data to database.
Required: At least one of the following conditions must be met
Request Body
workspace_idclient_dataThe workspace identifier
00000000-0000-0000-0000-000000000000company_nameat least 1 requiredCompany name
Acme Corporationcompany_name_cleanedoptionalCleaned company name
Acme Corporationcompany_domainat least 1 requiredCompany domain
acme.comcompany_linkedinat least 1 requiredCompany LinkedIn URL
https://linkedin.com/company/acmecompany_emailoptionalCompany email address
info@acme.comcompany_phoneoptionalCompany phone number
+1234567890company_instagramoptionalCompany Instagram URL
https://instagram.com/acmecompany_facebookoptionalCompany Facebook URL
https://facebook.com/acmecompany_xingoptionalCompany XING URL
https://xing.com/companies/acmecompany_pinterestoptionalCompany Pinterest URL
https://pinterest.com/acmecompany_tiktokoptionalCompany TikTok URL
https://tiktok.com/@acmecompany_youtubeoptionalCompany YouTube URL
https://youtube.com/@acmecompany_twitteroptionalCompany Twitter/X URL
https://twitter.com/acmecompany_legal_formoptionalCompany legal form
Inc.b2b_b2coptionalBusiness model (B2B or B2C)
B2BB2BB2Cbothcompany_imprint_nameoptionalCompany imprint name
Acme Corporation Inc.company_streetoptionalCompany street address
Main Streetcompany_street_nroptionalCompany street number
123company_cityoptionalCompany city
Münchencompany_zipoptionalCompany zip code
10001company_regionoptionalCompany region/state
Bayerncompany_countryoptionalCompany country
Germanycompany_steuer_nroptionalCompany tax number
30/321/50964company_vat_nroptionalCompany VAT number
DE123456789company_register_nroptionalCompany registration number
HRB 12345employees_researchoptionalNumber of employees from research
75employees_linkedinoptionalNumber of employees from LinkedIn
75company_founded_yearoptionalYear company was founded
2010descriptionoptionalCompany description
Leading provider of innovative solutionscompany_logo_urloptionalCompany logo URL
https://acme.com/logo.pngcompany_size_linkedinoptionalCompany size from LinkedIn
51-200company_linkedin_followersoptionalNumber of LinkedIn followers
1500company_tagsoptionalCompany tags
["technology","saas"]company_sourcesoptionalData sources
["linkedin","website"]lushaclayapollonorth_datad7_lead_finderstoreleadsbuild_withsales_navigatorcompany_qualifiedoptionalQualification status
qualifiedqualifiedpendingnot_qualifiedcompany_custom_tags_wsoptionalCustom workspace tags
["vip","partner"]curl -X POST "https://web-production-603a8.up.railway.app/company_push" -H "api_key: {{your_api_key}}" -H "Content-Type: application/json" -d '{ "workspace_id": "00000000-0000-0000-0000-000000000000", "company_name": "Acme Corporation", "company_name_cleaned": "Acme Corporation", "company_domain": "acme.com", "company_linkedin": "https://linkedin.com/company/acme", "company_email": "info@acme.com", "company_phone": "+1234567890", "company_instagram": "https://instagram.com/acme", "company_facebook": "https://facebook.com/acme", "company_xing": "https://xing.com/companies/acme", "company_pinterest": "https://pinterest.com/acme", "company_tiktok": "https://tiktok.com/@acme", "company_youtube": "https://youtube.com/@acme", "company_twitter": "https://twitter.com/acme", "company_legal_form": "Inc.", "b2b_b2c": "B2B", "company_imprint_name": "Acme Corporation Inc.", "company_street": "Main Street", "company_street_nr": "123", "company_city": "München", "company_zip": "10001", "company_region": "Bayern", "company_country": "Germany", "company_steuer_nr": "30/321/50964", "company_vat_nr": "DE123456789", "company_register_nr": "HRB 12345", "employees_research": 75, "employees_linkedin": 75, "company_founded_year": 2010, "description": "Leading provider of innovative solutions", "company_logo_url": "https://acme.com/logo.png", "company_size_linkedin": "51-200", "company_linkedin_followers": 1500, "company_tags": [ "technology", "saas" ], "company_sources": [ "linkedin", "website" ], "company_qualified": "qualified", "company_custom_tags_ws": [ "vip", "partner" ] }'
Response Parameters
company_main_iduuidUUID of the company (either found or newly created). null only if operation failed.
00000000-0000-0000-0000-000000000000company_workspace_iduuidUUID of the company-workspace connection (either found or newly created). Only populated if workspace_id was provided in request. null if workspace operation not performed or failed.
00000000-0000-0000-0000-000000000000status_companyenumStatus of the company record operation. Can be "found" (company already existed), "created" (new company created), or null (operation failed).
createdfoundcreatedstatus_company_workspaceenumStatus of the workspace connection operation. Can be "found" (workspace connection already existed), "created" (new workspace connection created), or null (not performed or failed). Only relevant when workspace_id provided in request.
createdfoundcreatederrorstringError message if any operation failed. null if all operations succeeded.
{
"company_main_id": "283f9e1f-89dd-4032-8e02-65acc6856ed1",
"company_workspace_id": "abc-123-def-456",
"status_company": "created",
"status_company_workspace": "created",
"error": null
}POST/company_push_patch
Update existing company data in database.
Required: At least one of the following conditions must be met
Request Body
workspace_idclient_dataThe workspace identifier
00000000-0000-0000-0000-000000000000company_nameat least 1 requiredCompany name
Acme Corporationcompany_name_cleanedoptionalCleaned company name
Acme Corporationcompany_domainat least 1 requiredCompany domain
acme.comcompany_linkedinat least 1 requiredCompany LinkedIn URL
https://linkedin.com/company/acmecompany_emailoptionalCompany email address
info@acme.comcompany_phoneoptionalCompany phone number
+1234567890company_instagramoptionalCompany Instagram URL
https://instagram.com/acmecompany_facebookoptionalCompany Facebook URL
https://facebook.com/acmecompany_xingoptionalCompany XING URL
https://xing.com/companies/acmecompany_pinterestoptionalCompany Pinterest URL
https://pinterest.com/acmecompany_tiktokoptionalCompany TikTok URL
https://tiktok.com/@acmecompany_youtubeoptionalCompany YouTube URL
https://youtube.com/@acmecompany_twitteroptionalCompany Twitter/X URL
https://twitter.com/acmecompany_legal_formoptionalCompany legal form
Inc.b2b_b2coptionalBusiness model (B2B or B2C)
B2BB2BB2Cbothcompany_imprint_nameoptionalCompany imprint name
Acme Corporation Inc.company_streetoptionalCompany street address
Main Streetcompany_street_nroptionalCompany street number
123company_cityoptionalCompany city
Münchencompany_zipoptionalCompany zip code
10001company_regionoptionalCompany region/state
Bayerncompany_countryoptionalCompany country
Germanycompany_steuer_nroptionalCompany tax number
30/321/50964company_vat_nroptionalCompany VAT number
DE123456789company_register_nroptionalCompany registration number
HRB 12345employees_researchoptionalNumber of employees from research
75employees_linkedinoptionalNumber of employees from LinkedIn
75company_founded_yearoptionalYear company was founded
2010descriptionoptionalCompany description
Leading provider of innovative solutionscompany_logo_urloptionalCompany logo URL
https://acme.com/logo.pngcompany_size_linkedinoptionalCompany size from LinkedIn
51-200company_linkedin_followersoptionalNumber of LinkedIn followers
1500company_tagsoptionalCompany tags
["technology","saas"]company_sourcesoptionalData sources
["linkedin","website"]lushaclayapollonorth_datad7_lead_finderstoreleadsbuild_withsales_navigatorcompany_qualifiedoptionalQualification status
qualifiedqualifiedpendingnot_qualifiedcompany_custom_tags_wsoptionalCustom workspace tags
["vip","partner"]curl -X POST "https://web-production-603a8.up.railway.app/company_push_patch" -H "api_key: {{your_api_key}}" -H "Content-Type: application/json" -d '{ "workspace_id": "00000000-0000-0000-0000-000000000000", "company_name": "Acme Corporation", "company_name_cleaned": "Acme Corporation", "company_domain": "acme.com", "company_linkedin": "https://linkedin.com/company/acme", "company_email": "info@acme.com", "company_phone": "+1234567890", "company_instagram": "https://instagram.com/acme", "company_facebook": "https://facebook.com/acme", "company_xing": "https://xing.com/companies/acme", "company_pinterest": "https://pinterest.com/acme", "company_tiktok": "https://tiktok.com/@acme", "company_youtube": "https://youtube.com/@acme", "company_twitter": "https://twitter.com/acme", "company_legal_form": "Inc.", "b2b_b2c": "B2B", "company_imprint_name": "Acme Corporation Inc.", "company_street": "Main Street", "company_street_nr": "123", "company_city": "München", "company_zip": "10001", "company_region": "Bayern", "company_country": "Germany", "company_steuer_nr": "30/321/50964", "company_vat_nr": "DE123456789", "company_register_nr": "HRB 12345", "employees_research": 75, "employees_linkedin": 75, "company_founded_year": 2010, "description": "Leading provider of innovative solutions", "company_logo_url": "https://acme.com/logo.png", "company_size_linkedin": "51-200", "company_linkedin_followers": 1500, "company_tags": [ "technology", "saas" ], "company_sources": [ "linkedin", "website" ], "company_qualified": "qualified", "company_custom_tags_ws": [ "vip", "partner" ] }'
Response Parameters
company_main_iduuidUUID of the company (found, created, or updated). null only if operation failed.
00000000-0000-0000-0000-000000000000company_workspace_iduuidUUID of the company-workspace connection (found, created, or updated). Only populated if workspace_id was provided in request. null if workspace operation not performed or failed.
00000000-0000-0000-0000-000000000000status_companyenumStatus of the company record operation. Can be "created" (new company created), "updated" (existing company updated), or null (operation failed). Note: Unlike /company_push, this endpoint never returns "found" - it always updates if found.
updatedcreatedupdatedstatus_company_workspaceenumStatus of the workspace connection operation. Can be "created" (new workspace connection created), "updated" (existing workspace connection updated), or null (not performed or failed). Only relevant when workspace_id provided in request. Note: Always updates if workspace connection exists.
updatedcreatedupdatederrorstringError message if any operation failed. null if all operations succeeded.
{
"company_main_id": "283f9e1f-89dd-4032-8e02-65acc6856ed1",
"company_workspace_id": "abc-123-def-456",
"status_company": "updated",
"status_company_workspace": "updated",
"error": null
}POST/company_delete_fields
Create a new companie record.
Required: The following fields are mandatory
Request Body
company_idrequiredThe unique identifier of the company
00000000-0000-0000-0000-000000000000company_workspace_idclient_dataThe workspace identifier
00000000-0000-0000-0000-000000000000company_nameoptionalCompany name to delete
https://linkedin.com/company/acmecompany_domainoptionalCompany domain to delete
acme.comcompany_linkedinoptionalCompany LinkedIn URL to delete
https://linkedin.com/company/acmecompany_emailoptionalCompany email address to delete
info@acme.comcompany_phoneoptionalCompany phone number to delete
+491234567890company_instagramoptionalCompany Instagram URL to delete
https://instagram.com/acmecompany_facebookoptionalCompany Facebook URL to delete
https://facebook.com/acmecompany_xingoptionalCompany XING URL to delete
https://xing.com/companies/acmecompany_pinterestoptionalCompany Pinterest URL to delete
https://pinterest.com/acmecompany_tiktokoptionalCompany TikTok URL to delete
https://tiktok.com/@acmecompany_youtubeoptionalCompany YouTube URL to delete
https://youtube.com/@acmecompany_twitteroptionalCompany Twitter/X URL to delete
https://twitter.com/acmecompany_name_cleanedoptionalSet to true to delete the cleaned company name field, false to keep it
falsetruefalsecompany_legal_formoptionalSet to true to delete the company legal form field, false to keep it
falsetruefalseb2b_b2coptionalSet to true to delete the business model field, false to keep it
falsetruefalsecompany_imprint_nameoptionalSet to true to delete the company imprint name field, false to keep it
falsetruefalsecompany_streetoptionalSet to true to delete the company street field, false to keep it
falsetruefalsecompany_street_nroptionalSet to true to delete the company street number field, false to keep it
falsetruefalsecompany_cityoptionalSet to true to delete the company city field, false to keep it
falsetruefalsecompany_zipoptionalSet to true to delete the company zip code field, false to keep it
falsetruefalsecompany_regionoptionalSet to true to delete the company region field, false to keep it
falsetruefalsecompany_countryoptionalSet to true to delete the company country field, false to keep it
falsetruefalsecompany_steuer_nroptionalSet to true to delete the company tax number field, false to keep it
falsetruefalsecompany_vat_nroptionalSet to true to delete the company VAT number field, false to keep it
falsetruefalsecompany_register_nroptionalSet to true to delete the company registration number field, false to keep it
falsetruefalseemployees_researchoptionalSet to true to delete the employees research field, false to keep it
falsetruefalseemployees_linkedinoptionalSet to true to delete the employees LinkedIn field, false to keep it
falsetruefalsecompany_founded_yearoptionalSet to true to delete the company founded year field, false to keep it
falsetruefalsedescriptionoptionalSet to true to delete the company description field, false to keep it
falsetruefalsecompany_logo_urloptionalSet to true to delete the company logo URL field, false to keep it
falsetruefalsecompany_size_linkedinoptionalSet to true to delete the company size LinkedIn field, false to keep it
falsetruefalsecompany_linkedin_followersoptionalSet to true to delete the LinkedIn followers field, false to keep it
falsetruefalsecompany_tagsoptionalArray of company tag values to delete from the database array. All inputted values in the array will be removed from the table array field
["technology","saas"]company_sourcesoptionalArray of data source values to delete from the database array. All inputted values in the array will be removed from the table array field
["linkedin","apollo"]lushaclayapollonorth_datad7_lead_finderstoreleadsbuild_withsales_navigatorcompany_qualifiedoptionalSet to true to delete the qualification status field, false to keep it
falsetruefalsecompany_custom_tags_wsoptionalArray of custom workspace tag values to delete from the database array. All inputted values in the array will be removed from the table array field
["vip","partner"]curl -X POST "https://web-production-603a8.up.railway.app/company_delete_fields" -H "api_key: {{your_api_key}}" -H "Content-Type: application/json" -d '{ "company_id": "00000000-0000-0000-0000-000000000000", "company_workspace_id": "00000000-0000-0000-0000-000000000000", "company_name": "https://linkedin.com/company/acme", "company_domain": "acme.com", "company_linkedin": "https://linkedin.com/company/acme", "company_email": "info@acme.com", "company_phone": "+491234567890", "company_instagram": "https://instagram.com/acme", "company_facebook": "https://facebook.com/acme", "company_xing": "https://xing.com/companies/acme", "company_pinterest": "https://pinterest.com/acme", "company_tiktok": "https://tiktok.com/@acme", "company_youtube": "https://youtube.com/@acme", "company_twitter": "https://twitter.com/acme", "company_name_cleaned": false, "company_legal_form": false, "b2b_b2c": false, "company_imprint_name": false, "company_street": false, "company_street_nr": false, "company_city": false, "company_zip": false, "company_region": false, "company_country": false, "company_steuer_nr": false, "company_vat_nr": false, "company_register_nr": false, "employees_research": false, "employees_linkedin": false, "company_founded_year": false, "description": false, "company_logo_url": false, "company_size_linkedin": false, "company_linkedin_followers": false, "company_tags": [ "technology", "saas" ], "company_sources": [ "linkedin", "apollo" ], "company_qualified": false, "company_custom_tags_ws": [ "vip", "partner" ] }'
Response Parameters
successbooleanIndicates whether the operation completed successfully. true = operation completed (even if no fields were deleted), false = operation failed due to error.
truemessagestringDetailed message describing what operations were performed. Success format: "Successfully completed: <list of operations>". Message may contain: "Set N fields to NULL" (boolean fields set to NULL in db_companies_main), "Removed N items from company_tags" (tags removed from company_tags array), "Removed N items from company_sources" (sources removed from company_sources array), "Deleted N identifier records" (identifier records deleted from db_companies_dt_identifiers), "Workspace: Set company_qualified to NULL" (workspace field set to NULL), "Workspace: Removed N custom tags" (custom tags removed from workspace).
Successfully completed: Set 3 fields to NULL; Removed 2 items from company_tags; Deleted 1 identifier records; Workspace: Set company_qualified to NULL, Removed 2 custom tags{
"success": true,
"message": "Successfully completed: Set 3 fields to NULL; Removed 2 items from company_tags; Deleted 1 identifier records; Workspace: Set company_qualified to NULL, Removed 2 custom tags"
}People
Manage people and contact records. Create, read, update, and delete person information including names, email addresses, phone numbers, and associated company relationships.
GET/contact_get
Get all peopl records.
Required: The following fields are mandatory
Query Parameters
lead_idrequiredThe unique identifier of the lead/contact
00000000-0000-0000-0000-000000000000workspace_idclient_dataThe workspace identifier
00000000-0000-0000-0000-000000000000company_idCompany identifier
00000000-0000-0000-0000-000000000000curl -X GET "https://web-production-603a8.up.railway.app/contact_get?lead_id=00000000-0000-0000-0000-000000000000&workspace_id=00000000-0000-0000-0000-000000000000&company_id=00000000-0000-0000-0000-000000000000" -H "api_key: {{your_api_key}}" -H "Content-Type: application/json"
Response Parameters
people_idstringReturns person-uuid-456
companies_main_idstringReturns company-uuid-789
lead_positionstringReturns Chief Executive Officer
lead_position_cleanedstringReturns CEO
lead_senioritystringReturns C-Level
lead_departementstringReturns Management
still_at_companystringReturns yes
lead_start_datestringReturns 2020-01-15
lead_end_dateobjectObject containing nested data
lead_seniority_enumstringReturns c_level
lead_departement_enumstringReturns management
lead_position_clean_plural_dativstringReturns CEOs
lead_position_clean_plural_nominativstringReturns CEOs
lead_summarystringReturns Experienced executive with 15 years in tech
lead_sourcesarrayArray of values
db_leads_created_atstringReturns 2024-02-10T09:15:00Z
db_leads_updated_atstringReturns 2024-03-25T11:20:00Z
person_first_namestringReturns John
contact_first_name_cleanedstringReturns John
person_last_namestringReturns Doe
contact_last_name_cleanedstringReturns Doe
person_genderstringReturns male
person_languagestringReturns de
contact_estimated_birth_yearstringReturns 1980
contact_birth_yearstringReturns 1982
contact_birth_datestringReturns 1982-05-15
person_countrystringReturns Germany
person_citystringReturns Berlin
linkedin_cvstringReturns Extensive experience in software development and leadership
linkedin_volunteeringsstringReturns Board member at Tech for Good
started_education_linkedinstringReturns 2000
first_job_start_linkedinstringReturns 2004
contact_locationstringReturns Berlin, Germany
contact_academic_titlestringReturns Dr.
person_statestringReturns Berlin
person_native_germanstringReturns yes
person_scooling_countrystringReturns Germany
contact_linkedin_image_urlstringReturns https://media.linkedin.com/profile.jpg
person_linkedin_followersstringReturns 2500
person_linkedin_connectionsstringReturns 500+
db_people_created_atstringReturns 2024-01-05T08:30:00Z
db_people_updated_atstringReturns 2024-03-18T10:45:00Z
contact_linkedinsarrayArray of values
contact_xingsarrayArray of values
contact_emails_validarrayArray of values
contact_emails_invalidarrayArray of values
contact_emails_catch_allarrayArray of values
contact_emails_wrongarrayArray of values
contact_emails_unsurearrayArray of values
lead_workspace_idstringReturns lead-ws-uuid-345
lead_qualified_wsstringReturns yes
db_leads_workspace_created_atstringReturns 2024-02-15T13:00:00Z
db_leads_workspace_updated_atstringReturns 2024-03-26T15:30:00Z
{
"people_id": "person-uuid-456",
"companies_main_id": "company-uuid-789",
"lead_position": "Chief Executive Officer",
"lead_position_cleaned": "CEO",
"lead_seniority": "C-Level",
"lead_departement": "Management",
"still_at_company": "yes",
"lead_start_date": "2020-01-15",
"lead_end_date": null,
"lead_seniority_enum": "c_level",
"lead_departement_enum": "management",
"lead_position_clean_plural_dativ": "CEOs",
"lead_position_clean_plural_nominativ": "CEOs",
"lead_summary": "Experienced executive with 15 years in tech",
"lead_sources": [
"linkedin",
"company_website"
],
"db_leads_created_at": "2024-02-10T09:15:00Z",
"db_leads_updated_at": "2024-03-25T11:20:00Z",
"person_first_name": "John",
"contact_first_name_cleaned": "John",
"person_last_name": "Doe",
"contact_last_name_cleaned": "Doe",
"person_gender": "male",
"person_language": "de",
"contact_estimated_birth_year": "1980",
"contact_birth_year": "1982",
"contact_birth_date": "1982-05-15",
"person_country": "Germany",
"person_city": "Berlin",
"linkedin_cv": "Extensive experience in software development and leadership",
"linkedin_volunteerings": "Board member at Tech for Good",
"started_education_linkedin": "2000",
"first_job_start_linkedin": "2004",
"contact_location": "Berlin, Germany",
"contact_academic_title": "Dr.",
"person_state": "Berlin",
"person_native_german": "yes",
"person_scooling_country": "Germany",
"contact_linkedin_image_url": "https://media.linkedin.com/profile.jpg",
"person_linkedin_followers": "2500",
"person_linkedin_connections": "500+",
"db_people_created_at": "2024-01-05T08:30:00Z",
"db_people_updated_at": "2024-03-18T10:45:00Z",
"contact_linkedins": [
"https://linkedin.com/in/johndoe",
"https://linkedin.com/in/john-doe"
],
"contact_xings": [
"https://xing.com/profile/johndoe"
],
"contact_emails_valid": [
"john@example.com",
"j.doe@example.com"
],
"contact_emails_invalid": [
"oldaddress@defunct.com"
],
"contact_emails_catch_all": [
"info@example.com"
],
"contact_emails_wrong": [],
"contact_emails_unsure": [
"john.doe@maybe.com"
],
"lead_workspace_id": "lead-ws-uuid-345",
"lead_qualified_ws": "yes",
"db_leads_workspace_created_at": "2024-02-15T13:00:00Z",
"db_leads_workspace_updated_at": "2024-03-26T15:30:00Z"
}POST/contact_lookup
Lookup if Lead and Person exists, and push if not exists.
Required: At least one of the following conditions must be met
Request Body
workspace_idclient_dataThe workspace identifier
00000000-0000-0000-0000-000000000000company_idat least 1 requiredCompany identifier
00000000-0000-0000-0000-000000000000contact_linkedinat least 1 requiredContact LinkedIn URL
https://linkedin.com/in/max-muellercontact_xingat least 1 requiredContact XING URL
https://xing.com/profile/max-muellercontact_email_validat least 1 requiredValid email address
max.mueller@example.comcontact_email_catch_allat least 1 requiredCatch-all email address
info@example.comcontact_email_invalidat least 1 requiredInvalid email address
invalid@example.comcontact_email_unsureat least 1 requiredUnsure email address
unsure@example.comcontact_first_nameat least 1 requiredContact first name
Maxcontact_first_name_cleanedat least 1 requiredCleaned contact first name
Maxcontact_last_nameat least 1 requiredContact last name
Müllercontact_last_name_cleanedat least 1 requiredCleaned contact last name
Muellercurl -X POST "https://web-production-603a8.up.railway.app/contact_lookup" -H "api_key: {{your_api_key}}" -H "Content-Type: application/json" -d '{ "workspace_id": "00000000-0000-0000-0000-000000000000", "company_id": "00000000-0000-0000-0000-000000000000", "contact_linkedin": "https://linkedin.com/in/max-mueller", "contact_xing": "https://xing.com/profile/max-mueller", "contact_email_valid": "max.mueller@example.com", "contact_email_catch_all": "info@example.com", "contact_email_invalid": "invalid@example.com", "contact_email_unsure": "unsure@example.com", "contact_first_name": "Max", "contact_first_name_cleaned": "Max", "contact_last_name": "Müller", "contact_last_name_cleaned": "Mueller" }'
Response Parameters
lead_iduuidUUID of the lead in db_leads table. Represents the connection between a person and a company. Returns UUID if lead found, null if not found.
00000000-0000-0000-0000-000000000000person_iduuidUUID of the person in db_people table. Returns UUID if person found, null if not found.
00000000-0000-0000-0000-000000000000lead_workspace_iduuidUUID of the lead-workspace connection in db_leads_workspace table. Only populated if workspace_id was provided in request. Returns UUID if workspace connection exists, null if not found.
00000000-0000-0000-0000-000000000000people_workspace_iduuidUUID of the people-workspace connection in db_people_workspace table. Only populated if workspace_id was provided in request. Returns UUID if workspace connection exists, null if not found.
00000000-0000-0000-0000-000000000000errorstringError message if lookup failed. null if no error occurred.
{
"lead_id": "lead-uuid-123",
"person_id": "person-uuid-456",
"lead_workspace_id": "lead-ws-uuid-789",
"people_workspace_id": "people-ws-uuid-012",
"error": null
}POST/contact_push
Push new contact data to database.
Required: At least one of the following conditions must be met
Request Body
workspace_idclient_dataThe workspace identifier
00000000-0000-0000-0000-000000000000company_idat least 1 requiredCompany identifier
00000000-0000-0000-0000-000000000000contact_email_validat least 1 requiredValid email address
max.mueller@example.comcontact_email_catch_allat least 1 requiredCatch-all email address
info@example.comcontact_email_invalidat least 1 requiredInvalid email address
invalid@example.comcontact_email_unsureat least 1 requiredUnsure email address
unsure@example.comcontact_linkedinat least 1 requiredContact LinkedIn URL
https://linkedin.com/in/max-muellercontact_xingat least 1 requiredContact XING URL
https://xing.com/profile/max-muellercontact_first_nameat least 1 requiredContact first name
Maxcontact_first_name_cleanedat least 1 requiredCleaned contact first name
Maxcontact_last_nameat least 1 requiredContact last name
Müllercontact_last_name_cleanedat least 1 requiredCleaned contact last name
Muellercontact_genderoptionalContact gender
Malecontact_languageoptionalContact language
Germancontact_estimated_birth_yearoptionalEstimated birth year
1990contact_birth_yearoptionalBirth year
1990contact_birth_dateoptionalBirth date
1990-05-15person_countryoptionalPerson country
Germanyperson_cityoptionalPerson city
Münchenlinkedin_cvoptionalLinkedIn CV data
{"experience":[{"company":"Example GmbH","position":"Sales Manager","duration":"2020-2023"}]}linkedin_volunteeringsoptionalLinkedIn volunteering activities
{"organizations":["Non-Profit Organization"]}started_education_linkedinoptionalEducation start date from LinkedIn
2010first_job_start_linkedinoptionalFirst job start date from LinkedIn
2015contact_locationoptionalContact location
München, Bayern, Germanycontact_academic_titleoptionalAcademic title
Dr.person_stateoptionalPerson state/region
Bayernperson_native_germanoptionalNative German speaker indicator
trueperson_scooling_countryoptionalSchooling country
Germanycontact_linkedin_image_urloptionalLinkedIn profile image URL
https://media.licdn.com/dms/image/example/profile.jpgperson_linkedin_followersoptionalNumber of LinkedIn followers
500person_linkedin_connectionsoptionalNumber of LinkedIn connections
300lead_positionoptionalJob position
Sales Managerlead_position_cleanedoptionalCleaned job position
Sales Managerlead_seniorityoptionalSeniority level
Managerlead_departementoptionalDepartment
Salesstill_at_companyoptionalStill employed at company
truetruefalselead_start_dateoptionalPosition start date
2020-01-01lead_end_dateoptionalPosition end date
2023-12-31lead_seniority_enumoptionalSeniority level enum
managerc-levelgeschäftsführungheadmanagerentrydirectorpartnerpresidentinternlead_departement_enumoptionalDepartment enum
salesmarketingsalesgeschäftsführungprocurementlegalaccountingfinancelead_position_clean_plural_dativoptionalPosition in plural dativ form
Vertriebsleiternlead_position_clean_plural_nominativoptionalPosition in plural nominativ form
Vertriebsleiterlead_summaryoptionalLead summary
Experienced sales professional with 10+ years in B2B software saleslead_sourcesoptionalLead sources
["apollo","sales_navigator"]lushaclayapollonorth_datad7_lead_finderstoreleadsbuild_withsales_navigatorlead_qualified_wsoptionalWorkspace qualification status
qualifiedqualifiedpendingnot_qualifiedcurl -X POST "https://web-production-603a8.up.railway.app/contact_push" -H "api_key: {{your_api_key}}" -H "Content-Type: application/json" -d '{ "workspace_id": "00000000-0000-0000-0000-000000000000", "company_id": "00000000-0000-0000-0000-000000000000", "contact_email_valid": "max.mueller@example.com", "contact_email_catch_all": "info@example.com", "contact_email_invalid": "invalid@example.com", "contact_email_unsure": "unsure@example.com", "contact_linkedin": "https://linkedin.com/in/max-mueller", "contact_xing": "https://xing.com/profile/max-mueller", "contact_first_name": "Max", "contact_first_name_cleaned": "Max", "contact_last_name": "Müller", "contact_last_name_cleaned": "Mueller", "contact_gender": "Male", "contact_language": "German", "contact_estimated_birth_year": 1990, "contact_birth_year": 1990, "contact_birth_date": "1990-05-15", "person_country": "Germany", "person_city": "München", "linkedin_cv": { "experience": [ { "company": "Example GmbH", "position": "Sales Manager", "duration": "2020-2023" } ] }, "linkedin_volunteerings": { "organizations": [ "Non-Profit Organization" ] }, "started_education_linkedin": 2010, "first_job_start_linkedin": 2015, "contact_location": "München, Bayern, Germany", "contact_academic_title": "Dr.", "person_state": "Bayern", "person_native_german": "true", "person_scooling_country": "Germany", "contact_linkedin_image_url": "https://media.licdn.com/dms/image/example/profile.jpg", "person_linkedin_followers": 500, "person_linkedin_connections": 300, "lead_position": "Sales Manager", "lead_position_cleaned": "Sales Manager", "lead_seniority": "Manager", "lead_departement": "Sales", "still_at_company": true, "lead_start_date": "2020-01-01", "lead_end_date": "2023-12-31", "lead_seniority_enum": "manager", "lead_departement_enum": "sales", "lead_position_clean_plural_dativ": "Vertriebsleitern", "lead_position_clean_plural_nominativ": "Vertriebsleiter", "lead_summary": "Experienced sales professional with 10+ years in B2B software sales", "lead_sources": [ "apollo", "sales_navigator" ], "lead_qualified_ws": "qualified" }'
Response Parameters
person_iduuidUUID of the person record (either found or newly created). null only if person operation failed.
00000000-0000-0000-0000-000000000000lead_iduuidUUID of the lead record (either found or newly created). Lead connects a person to a company with position/role information. null if lead operation not performed or failed. Requires both person_id and company_id to be present.
00000000-0000-0000-0000-000000000000people_workspace_iduuidUUID of the people-workspace connection (either found or newly created). Only populated if workspace_id was provided in request. null if workspace operation not performed or failed.
00000000-0000-0000-0000-000000000000lead_workspace_iduuidUUID of the lead-workspace connection (either found or newly created). Only populated if workspace_id was provided in request. null if workspace operation not performed or failed.
00000000-0000-0000-0000-000000000000status_personenumStatus of the person record operation. Can be "found" (person already existed), "created" (new person created), or null (operation failed).
createdfoundcreatedstatus_leadenumStatus of the lead record operation. Can be "found" (lead already existed), "created" (new lead created), or null (not performed or failed). Only created if both person_id and company_id exist.
createdfoundcreatedstatus_people_workspaceenumStatus of the people-workspace connection operation. Can be "found" (workspace connection already existed), "created" (new workspace connection created), or null (not performed or failed). Only relevant when workspace_id provided in request.
createdfoundcreatedstatus_lead_workspaceenumStatus of the lead-workspace connection operation. Can be "found" (workspace connection already existed), "created" (new workspace connection created), or null (not performed or failed). Only relevant when workspace_id provided in request.
createdfoundcreatederrorstringError message if any operation failed. null if all operations succeeded.
{
"person_id": "person-uuid-456",
"lead_id": "lead-uuid-123",
"people_workspace_id": "people-ws-uuid-012",
"lead_workspace_id": "lead-ws-uuid-789",
"status_person": "created",
"status_lead": "created",
"status_people_workspace": "created",
"status_lead_workspace": "created",
"error": null
}POST/contact_push_patch
Update existing contact data in database.
Required: At least one of the following conditions must be met
Request Body
workspace_idclient_dataThe workspace identifier
00000000-0000-0000-0000-000000000000company_idat least 1 requiredCompany identifier
00000000-0000-0000-0000-000000000000contact_email_validat least 1 requiredValid email address
max.mueller@example.comcontact_email_catch_allat least 1 requiredCatch-all email address
info@example.comcontact_email_invalidat least 1 requiredInvalid email address
invalid@example.comcontact_email_unsureat least 1 requiredUnsure email address
unsure@example.comcontact_linkedinat least 1 requiredContact LinkedIn URL
https://linkedin.com/in/max-muellercontact_xingat least 1 requiredContact XING URL
https://xing.com/profile/max-muellercontact_first_nameat least 1 requiredContact first name
Maxcontact_first_name_cleanedat least 1 requiredCleaned contact first name
Maxcontact_last_nameat least 1 requiredContact last name
Müllercontact_last_name_cleanedat least 1 requiredCleaned contact last name
Muellercontact_genderoptionalContact gender
Malecontact_languageoptionalContact language
Germancontact_estimated_birth_yearoptionalEstimated birth year
1990contact_birth_yearoptionalBirth year
1990contact_birth_dateoptionalBirth date
1990-05-15person_countryoptionalPerson country
Germanyperson_cityoptionalPerson city
Münchenlinkedin_cvoptionalLinkedIn CV data
{"experience":[{"company":"Example GmbH","position":"Sales Manager","duration":"2020-2023"}]}linkedin_volunteeringsoptionalLinkedIn volunteering activities
{"organizations":["Non-Profit Organization"]}started_education_linkedinoptionalEducation start date from LinkedIn
2010first_job_start_linkedinoptionalFirst job start date from LinkedIn
2015contact_locationoptionalContact location
München, Bayern, Germanycontact_academic_titleoptionalAcademic title
Dr.person_stateoptionalPerson state/region
Bayernperson_native_germanoptionalNative German speaker indicator
trueperson_scooling_countryoptionalSchooling country
Germanycontact_linkedin_image_urloptionalLinkedIn profile image URL
https://media.licdn.com/dms/image/example/profile.jpgperson_linkedin_followersoptionalNumber of LinkedIn followers
500person_linkedin_connectionsoptionalNumber of LinkedIn connections
300lead_positionoptionalJob position
Sales Managerlead_position_cleanedoptionalCleaned job position
Sales Managerlead_seniorityoptionalSeniority level
Managerlead_departementoptionalDepartment
Salesstill_at_companyoptionalStill employed at company
truetruefalselead_start_dateoptionalPosition start date
2020-01-01lead_end_dateoptionalPosition end date
2023-12-31lead_position_clean_plural_dativoptionalPosition in plural dativ form
Vertriebsleiternlead_position_clean_plural_nominativoptionalPosition in plural nominativ form
Vertriebsleiterlead_summaryoptionalLead summary
Experienced sales professional with 10+ years in B2B software saleslead_sourcesoptionalLead sources
["apollo","sales_navigator"]lushaclayapollonorth_datad7_lead_finderstoreleadsbuild_withsales_navigatorlead_qualified_wsoptionalWorkspace qualification status
qualifiedqualifiedpendingnot_qualifiedcurl -X POST "https://web-production-603a8.up.railway.app/contact_push_patch" -H "api_key: {{your_api_key}}" -H "Content-Type: application/json" -d '{ "workspace_id": "00000000-0000-0000-0000-000000000000", "company_id": "00000000-0000-0000-0000-000000000000", "contact_email_valid": "max.mueller@example.com", "contact_email_catch_all": "info@example.com", "contact_email_invalid": "invalid@example.com", "contact_email_unsure": "unsure@example.com", "contact_linkedin": "https://linkedin.com/in/max-mueller", "contact_xing": "https://xing.com/profile/max-mueller", "contact_first_name": "Max", "contact_first_name_cleaned": "Max", "contact_last_name": "Müller", "contact_last_name_cleaned": "Mueller", "contact_gender": "Male", "contact_language": "German", "contact_estimated_birth_year": 1990, "contact_birth_year": 1990, "contact_birth_date": "1990-05-15", "person_country": "Germany", "person_city": "München", "linkedin_cv": { "experience": [ { "company": "Example GmbH", "position": "Sales Manager", "duration": "2020-2023" } ] }, "linkedin_volunteerings": { "organizations": [ "Non-Profit Organization" ] }, "started_education_linkedin": 2010, "first_job_start_linkedin": 2015, "contact_location": "München, Bayern, Germany", "contact_academic_title": "Dr.", "person_state": "Bayern", "person_native_german": "true", "person_scooling_country": "Germany", "contact_linkedin_image_url": "https://media.licdn.com/dms/image/example/profile.jpg", "person_linkedin_followers": 500, "person_linkedin_connections": 300, "lead_position": "Sales Manager", "lead_position_cleaned": "Sales Manager", "lead_seniority": "Manager", "lead_departement": "Sales", "still_at_company": true, "lead_start_date": "2020-01-01", "lead_end_date": "2023-12-31", "lead_position_clean_plural_dativ": "Vertriebsleitern", "lead_position_clean_plural_nominativ": "Vertriebsleiter", "lead_summary": "Experienced sales professional with 10+ years in B2B software sales", "lead_sources": [ "apollo", "sales_navigator" ], "lead_qualified_ws": "qualified" }'
Response Parameters
person_iduuidUUID of the person record (found, created, or updated). null only if person operation failed.
00000000-0000-0000-0000-000000000000lead_iduuidUUID of the lead record (found, created, or updated). Lead connects a person to a company with position/role information. null if lead operation not performed or failed. Requires both person_id and company_id to be present.
00000000-0000-0000-0000-000000000000people_workspace_iduuidUUID of the people-workspace connection (found, created, or updated). Only populated if workspace_id was provided in request. null if workspace operation not performed or failed.
00000000-0000-0000-0000-000000000000lead_workspace_iduuidUUID of the lead-workspace connection (found, created, or updated). Only populated if workspace_id was provided in request. null if workspace operation not performed or failed.
00000000-0000-0000-0000-000000000000status_personenumStatus of the person record operation. Can be "created" (new person created), "updated" (existing person updated), or null (operation failed). Note: Unlike /contact_push, this endpoint never returns "found" - it always updates if found.
updatedcreatedupdatedstatus_leadenumStatus of the lead record operation. Can be "created" (new lead created), "updated" (existing lead updated), or null (not performed or failed). Note: Always updates if lead exists.
updatedcreatedupdatedstatus_people_workspaceenumStatus of the people-workspace connection operation. Can be "created" (new workspace connection created), "updated" (existing workspace connection updated), or null (not performed or failed). Only relevant when workspace_id provided in request. Note: Always updates if workspace connection exists.
updatedcreatedupdatedstatus_lead_workspaceenumStatus of the lead-workspace connection operation. Can be "created" (new workspace connection created), "updated" (existing workspace connection updated), or null (not performed or failed). Only relevant when workspace_id provided in request. Note: Always updates if workspace connection exists.
updatedcreatedupdatederrorstringError message if any operation failed. null if all operations succeeded.
{
"person_id": "person-uuid-456",
"lead_id": "lead-uuid-123",
"people_workspace_id": "people-ws-uuid-012",
"lead_workspace_id": "lead-ws-uuid-789",
"status_person": "updated",
"status_lead": "updated",
"status_people_workspace": "updated",
"status_lead_workspace": "updated",
"error": null
}POST/contact_delete_fields
Create a new peopl record.
Required: The following fields are mandatory
Request Body
people_idrequiredThe unique identifier of the person
00000000-0000-0000-0000-000000000000lead_idrequiredThe unique identifier of the lead
00000000-0000-0000-0000-000000000000people_workspace_idclient_dataThe workspace identifier for people
00000000-0000-0000-0000-000000000000leads_workspace_idclient_dataThe workspace identifier for leads
00000000-0000-0000-0000-000000000000contact_linkedinoptionalThe exact LinkedIn URL value to delete from the database
https://linkedin.com/in/john-doecontact_xingoptionalThe exact XING URL value to delete from the database
https://xing.com/profile/john-doecontact_email_validoptionalThe exact valid email address to delete from the database
john.doe@example.comcontact_email_catch_alloptionalThe exact catch-all email address to delete from the database
contact@example.comcontact_email_invalidoptionalThe exact invalid email address to delete from the database
invalid@example.comcontact_email_unsureoptionalThe exact unsure email address to delete from the database
unsure@example.comcontact_first_name_cleanedoptionalSet to true to delete the cleaned first name field, false to keep it
falsetruefalsecontact_last_name_cleanedoptionalSet to true to delete the cleaned last name field, false to keep it
falsetruefalseperson_genderoptionalSet to true to delete the gender field, false to keep it
falsetruefalseperson_languageoptionalSet to true to delete the language field, false to keep it
falsetruefalsecontact_estimated_birth_yearoptionalSet to true to delete the estimated birth year field, false to keep it
falsetruefalsecontact_birth_yearoptionalSet to true to delete the birth year field, false to keep it
falsetruefalsecontact_birth_dateoptionalSet to true to delete the birth date field, false to keep it
falsetruefalseperson_countryoptionalSet to true to delete the country field, false to keep it
falsetruefalseperson_cityoptionalSet to true to delete the city field, false to keep it
falsetruefalselinkedin_cvoptionalSet to true to delete the LinkedIn CV field, false to keep it
falsetruefalselinkedin_volunteeringsoptionalSet to true to delete the LinkedIn volunteerings field, false to keep it
falsetruefalsestarted_education_linkedinoptionalSet to true to delete the education start date field, false to keep it
falsetruefalsefirst_job_start_linkedinoptionalSet to true to delete the first job start date field, false to keep it
falsetruefalsecontact_locationoptionalSet to true to delete the location field, false to keep it
falsetruefalsecontact_academic_titleoptionalSet to true to delete the academic title field, false to keep it
falsetruefalseperson_stateoptionalSet to true to delete the state field, false to keep it
falsetruefalseperson_native_germanoptionalSet to true to delete the native German field, false to keep it
falsetruefalseperson_scooling_countryoptionalSet to true to delete the schooling country field, false to keep it
falsetruefalsecontact_linkedin_image_urloptionalSet to true to delete the LinkedIn image URL field, false to keep it
falsetruefalseperson_linkedin_followersoptionalSet to true to delete the LinkedIn followers field, false to keep it
falsetruefalseperson_linkedin_connectionsoptionalSet to true to delete the LinkedIn connections field, false to keep it
falsetruefalselead_positionoptionalSet to true to delete the position field, false to keep it
falsetruefalselead_position_cleanedoptionalSet to true to delete the cleaned position field, false to keep it
falsetruefalselead_seniorityoptionalSet to true to delete the seniority field, false to keep it
falsetruefalselead_departementoptionalSet to true to delete the department field, false to keep it
falsetruefalsestill_at_companyoptionalSet to true to delete the still at company field, false to keep it
falsetruefalselead_start_dateoptionalSet to true to delete the start date field, false to keep it
falsetruefalselead_end_dateoptionalSet to true to delete the end date field, false to keep it
falsetruefalselead_seniority_enumoptionalSet to true to delete the seniority enum field, false to keep it
falsetruefalselead_departement_enumoptionalSet to true to delete the department enum field, false to keep it
falsetruefalselead_position_clean_plural_dativoptionalSet to true to delete the position plural dativ field, false to keep it
falsetruefalselead_position_clean_plural_nominativoptionalSet to true to delete the position plural nominativ field, false to keep it
falsetruefalselead_summaryoptionalSet to true to delete the lead summary field, false to keep it
falsetruefalselead_sourcesoptionalArray of lead source values to delete from the database array. All inputted values in the array will be removed from the table array field
["apollo","sales_navigator"]lushaclayapollonorth_datad7_lead_finderstoreleadsbuild_withsales_navigatorlead_qualified_wsoptionalSet to true to delete the workspace qualification field, false to keep it
falsetruefalsecurl -X POST "https://web-production-603a8.up.railway.app/contact_delete_fields" -H "api_key: {{your_api_key}}" -H "Content-Type: application/json" -d '{ "people_id": "00000000-0000-0000-0000-000000000000", "lead_id": "00000000-0000-0000-0000-000000000000", "people_workspace_id": "00000000-0000-0000-0000-000000000000", "leads_workspace_id": "00000000-0000-0000-0000-000000000000", "contact_linkedin": "https://linkedin.com/in/john-doe", "contact_xing": "https://xing.com/profile/john-doe", "contact_email_valid": "john.doe@example.com", "contact_email_catch_all": "contact@example.com", "contact_email_invalid": "invalid@example.com", "contact_email_unsure": "unsure@example.com", "contact_first_name_cleaned": false, "contact_last_name_cleaned": false, "person_gender": false, "person_language": false, "contact_estimated_birth_year": false, "contact_birth_year": false, "contact_birth_date": false, "person_country": false, "person_city": false, "linkedin_cv": false, "linkedin_volunteerings": false, "started_education_linkedin": false, "first_job_start_linkedin": false, "contact_location": false, "contact_academic_title": false, "person_state": false, "person_native_german": false, "person_scooling_country": false, "contact_linkedin_image_url": false, "person_linkedin_followers": false, "person_linkedin_connections": false, "lead_position": false, "lead_position_cleaned": false, "lead_seniority": false, "lead_departement": false, "still_at_company": false, "lead_start_date": false, "lead_end_date": false, "lead_seniority_enum": false, "lead_departement_enum": false, "lead_position_clean_plural_dativ": false, "lead_position_clean_plural_nominativ": false, "lead_summary": false, "lead_sources": [ "apollo", "sales_navigator" ], "lead_qualified_ws": false }'
Response Parameters
successbooleanIndicates whether the operation completed successfully. true = operation completed (even if no fields were deleted), false = operation failed due to error.
truemessagestringDetailed message describing what operations were performed. Success format: "Successfully completed: <list of operations>". Message may contain People Operations: "People: Set N fields to NULL" (boolean fields set to NULL in db_people), "People: Deleted N identifier records" (identifier records deleted from db_people_identifiers for LinkedIn/Xing). Leads Operations: "Leads: Set N fields to NULL" (boolean fields set to NULL in db_leads), "Leads: Removed N items from lead_sources" (sources removed from lead_sources array), "Leads: Deleted N email identifier records" (email identifier records deleted from db_leads_identifiers). Workspace Operations: "People Workspace: No fields to update (table only contains IDs)" (people workspace table has no deletable fields), "Leads Workspace: Set lead_qualified_ws to NULL" (workspace qualification field set to NULL). No operations format: "No operations performed (no fields specified for deletion)". Error format: "Database error: <error details>" or "Error: <error details>".
Successfully completed: People: Set 2 fields to NULL; Leads: Set 3 fields to NULL; Leads: Removed 1 items from lead_sources; People: Deleted 1 identifier records; Leads: Deleted 2 email identifier records; Leads Workspace: Set lead_qualified_ws to NULL{
"success": true,
"message": "Successfully completed: People: Set 2 fields to NULL; Leads: Set 3 fields to NULL; Leads: Removed 1 items from lead_sources; People: Deleted 1 identifier records; Leads: Deleted 2 email identifier records; Leads Workspace: Set lead_qualified_ws to NULL"
}Field Cleaning
Overview
Field cleaning is a critical data normalization process that ensures consistency, improves matching accuracy, and prevents duplicate records in the database. All incoming data is cleaned before database operations (lookup, push, push_patch).
When Field Cleaning is Applied
- •Before company lookup operations
- •Before contact/lead lookup operations
- •Before pushing new company data
- •Before pushing new contact/lead data
- •Before updating existing records (push_patch operations)
Purpose
- •Standardize data formats for accurate matching
- •Remove inconsistencies and variations
- •Enable reliable deduplication
- •Improve data quality
Pre-Processing Stage (Applied to ALL Fields)
Before any field-specific cleaning, ALL string fields undergo standardization:
1. Whitespace Stripping
Purpose: Remove leading/trailing spaces that cause matching failures
" example.com ""example.com"2. Quote Normalization (normalize_quotes)
Purpose: Convert all Unicode quote characters to standard ASCII quotes
Characters Replaced:
- • ' (U+2019 - RIGHT SINGLE QUOTATION MARK) → '
- • ' (U+2018 - LEFT SINGLE QUOTATION MARK) → '
- • ` (U+0060 - GRAVE ACCENT/BACKTICK) → '
- • ´ (U+00B4 - ACUTE ACCENT) → '
- • " (U+201C - LEFT DOUBLE QUOTATION MARK) → "
- • " (U+201D - RIGHT DOUBLE QUOTATION MARK) → "
"O'Brien's Company""O'Brien's Company"Why This Matters:
- • Smart quotes come from copy-paste from Word, PDFs, websites
- • Database comparisons fail when quotes don't match
- • Enables consistent matching across data sources
3. Empty Field Removal
Purpose: Remove fields that are null, empty string, or whitespace-only
Removal Criteria:
- •
None→ Removed - •
""→ Removed - •
" "→ Removed (becomes "" after strip)
Impact:
- • Reduces payload size
- • Prevents NULL constraint violations
- • Improves database performance
- • Fields not removed:
0,False,[](valid data)
Order of Operations
- 1Strip whitespace from all string fields
- 2Normalize quotes in all string fields
- 3Convert empty/whitespace fields to None
- 4Apply field-specific cleaning (domain, LinkedIn, etc.)
- 5Remove all None/empty fields from payload
Company Field Cleaning
The clean_company_fields(data: dict) → dict function processes company data through multiple stages.
1. Domain Cleaning (clean_domain)
Purpose: Normalize website URLs to consistent domain format for reliable matching
Detailed Algorithm:
- 1. Strip Whitespace: Remove leading/trailing spaces
- 2. Protocol Removal: Remove prefixes in order of priority:
- -
https://www. - -
http://www. - -
https:// - -
http:// - -
www.
- -
- 3. Path Removal: Split by
/and take only first part (domain) - 4. Query Parameter Removal: Split by
?and take only first part - 5. Null on Empty: If result is empty string →
NULL
Examples:
✓ Valid Examples:
"https://www.example.com/about-us?ref=home""example.com""HTTP://WWW.COMPANY.DE/""company.de""subdomain.example.com/products""subdomain.example.com""www.test.org""test.org""https://api.service.com/v1/endpoint""api.service.com"✗ Invalid (becomes NULL):
"" (empty string)NULL" " (whitespace only)NULL"https://" (no domain after protocol)NULLEdge Cases:
Handling:
- • Subdomains: Preserved intact (e.g.,
blog.company.com) - • Fragments: Removed (e.g.,
example.com#section→example.com) - • Multiple Slashes: Only first part kept
- • Port Numbers: Preserved (e.g.,
localhost:8080) - • International Domains: Preserved as-is
Validation Logic:
- • Does NOT validate actual domain format
- • Does NOT check TLD validity (.com, .de)
- • Does NOT perform DNS lookups
- • Simply extracts and normalizes domain portion
- • Allows
localhost,127.0.0.1
Error Handling:
- • No try-except block needed (string operations only)
- • Empty results after cleaning →
NULL
Fields Cleaned:
- •
company_domain - • All domain identifiers in
company_domainsarray
Impact on Matching:
Without Cleaning:
"https://www.example.com" ≠ "example.com" → Creates duplicate
With Cleaning:
"https://www.example.com" = "example.com" → Prevents duplicate
2. LinkedIn URL Cleaning (clean_linkedin_url)
Purpose: Validate and standardize LinkedIn company page URLs
Detailed Algorithm:
- 1. Validation Check: URL must contain
linkedin.com/company/ - 2. URL Splitting: Split URL by
linkedin.com/company/ - 3. Slug Extraction:
- - Take everything after
linkedin.com/company/ - - Remove trailing paths (split by
/, take first part) - - Remove query parameters (split by
?, take first part)
- - Take everything after
- 4. Length Validation: Slug must be at least 2 characters long
- 5. Reconstruction: Build URL as
https://www.linkedin.com/company/{slug} - 6. Null on Failure: Set to
NULLif any validation fails
Examples:
✓ Valid:
"https://www.linkedin.com/company/microsoft/""https://www.linkedin.com/company/microsoft""linkedin.com/company/google/about/""https://www.linkedin.com/company/google""https://de.linkedin.com/company/bmw-group?trk=public""https://www.linkedin.com/company/bmw-group""http://www.linkedin.com/company/apple""https://www.linkedin.com/company/apple"✗ Invalid (becomes NULL):
"linkedin.com/school/stanford-university"NULL"https://www.linkedin.com/company/a"NULL"linkedin.com/company/"NULL"https://www.linkedin.com/in/person-name"NULL"https://facebook.com/company"NULLIMPORTANT VALIDATION RULES:
- • ONLY accepts
/company/URLs - • REJECTS
/school/URLs →NULL(despite earlier documentation suggesting otherwise) - • REJECTS personal profiles (
/in/) →NULL - • REJECTS showcase pages (
/showcase/) →NULL
Edge Cases:
- • Locale Prefixes: Removed automatically (e.g.,
de.linkedin.com→www.linkedin.com) - • Mobile URLs: Handled (e.g.,
m.linkedin.com→www.linkedin.com) - • Query Parameters: All removed (e.g.,
?trk=public,?original_referer=) - • Trailing Slashes: Removed from slug
- • Sub-pages: Removed (e.g.,
/about,/people,/jobs)
Slug Validation:
- • Minimum length: 2 characters
- • Can contain: letters, numbers, hyphens, underscores
- • No validation of actual company existence on LinkedIn
- • No case transformation (preserves original case)
Error Handling:
- • Try-except block catches malformed URLs
- • Any exception during processing →
NULL - • Missing parts after split →
NULL - • Empty slug after extraction →
NULL
Common Rejection Scenarios:
| Input Type | Example | Result | Reason |
|---|---|---|---|
| School page | linkedin.com/school/stanford | NULL | Not /company/ |
| Personal profile | linkedin.com/in/john-doe | NULL | Not /company/ |
| Showcase page | linkedin.com/showcase/product | NULL | Not /company/ |
| Short slug | linkedin.com/company/a | NULL | Slug < 2 chars |
| No slug | linkedin.com/company/ | NULL | Empty slug |
| Wrong platform | xing.com/companies/test | NULL | Not LinkedIn |
Fields Cleaned:
- •
company_linkedin - • All LinkedIn identifiers in
company_linkedinsarray
Impact on Matching:
Without Cleaning:
"https://de.linkedin.com/company/bmw?trk=public" ≠ "linkedin.com/company/bmw" → Creates duplicate
With Cleaning:
"https://de.linkedin.com/company/bmw?trk=public" = "https://www.linkedin.com/company/bmw" → Prevents duplicate
3. Email Cleaning
⚠️ NOTE: Email cleaning is NOT implemented in the current codebase.
Emails are handled through the pre-processing stage only (whitespace stripping and quote normalization).
Current Behavior:
- • Whitespace is stripped (pre-processing)
- • Quotes are normalized (pre-processing)
- • No case transformation
- • No validation
- • Field passes through as-is after pre-processing
Actual vs Expected Behavior:
Actual (Current):
"Info@Company.COM""Info@Company.COM"Expected (Not Implemented):
"Info@Company.COM""info@company.com"Fields Affected:
- •
company_email - • All email identifiers
To Implement:
- • Lowercase conversion
- • @ symbol validation
- • Email format validation
4. Phone Number Cleaning
⚠️ NOTE: Phone number cleaning is NOT implemented in the current codebase.
Phone numbers are handled through the pre-processing stage only (whitespace stripping and quote normalization).
Current Behavior:
- • Whitespace is stripped (pre-processing)
- • Quotes are normalized (pre-processing)
- • No format transformation
- • No validation
- • Field passes through as-is after pre-processing
Actual vs Expected Behavior:
Actual (Current):
"+49 (30) 1234-5678""+49 (30) 1234-5678"Expected (Not Implemented):
"+49 (30) 1234-5678""+493012345678"Fields Affected:
- •
company_phone - • All phone identifiers
To Implement:
- • Remove non-digit characters (except +)
- • Normalize to E.164 format
- • Add + prefix for international numbers
- • Validate phone number format
5. Social Media URL Cleaning
Multiple functions clean different social media platforms to consistent formats. Each platform has strict validation rules and will set the field to NULL if validation fails.
Platform Output Formats & Special Notes:
| Platform | Required Pattern | Output Format | Special Notes |
|---|---|---|---|
instagram.com | https://www.instagram.com/{slug} | Accepts any Instagram URL | |
facebook.com | https://www.facebook.com/{slug} | Accepts any Facebook URL | |
xing.com/pages/ | https://www.xing.com/pages/{slug} | ONLY /pages/ URLs | |
pinterest.com | https://de.pinterest.com/{slug} | Always German locale | |
| TikTok | tiktok.com/@ | https://www.tiktok.com/@{slug} | Requires @ symbol |
| YouTube | youtube.com | https://www.youtube.com/{slug} | Preserves path format (/c/, /channel/, /@, /user/) |
| Twitter/X | x.com | https://x.com/{slug} | ONLY x.com, NOT twitter.com |
Common Algorithm (ALL Platforms):
- 1. Remove protocol (http://, https://)
- 2. Remove www. prefix
- 3. Remove query parameters and fragments
- 4. Remove trailing slashes
- 5. Convert to lowercase (except for case-sensitive platforms)
- 6. Keep platform-specific path structure
"https://www.instagram.com/company/?hl=en""https://www.instagram.com/company""https://www.facebook.com/Page/""https://www.facebook.com/Page""https://www.xing.com/pages/name""https://www.xing.com/pages/name""https://www.pinterest.com/boards/""https://de.pinterest.com/boards"TikTok
"https://www.tiktok.com/@name?lang=en""https://www.tiktok.com/@name"YouTube
"https://www.youtube.com/c/Channel""https://www.youtube.com/c/Channel"Twitter/X
"https://twitter.com/Handle?ref_src=twsrc"NULLError Recovery:
- • No partial saves - invalid URLs become
NULL - • No fallback attempts - strict validation
- • No logging of failed URLs - silent
NULLassignment - • Fields with
NULLare removed from payload before database insertion
Detailed Platform Algorithms
Each platform has its own specialized cleaning function with unique validation rules.
Pinterest (clean_pinterest)
Purpose: Validate and normalize Pinterest profile URLs
Detailed Algorithm:
- 1. Validation Check: URL must contain
pinterest.com - 2. URL Splitting: Split URL by
pinterest.com/ - 3. Slug Extraction: Take everything after
pinterest.com/, remove trailing paths (split by/, take first part), remove query parameters (split by?, take first part) - 4. Length Validation: Slug must be at least 2 characters long
- 5. Reconstruction: Build URL as
https://de.pinterest.com/{slug} - 6. Null on Failure: Set to
NULLif any validation fails
Examples:
✓ Valid:
"https://www.pinterest.com/company_boards/""https://de.pinterest.com/company_boards""pinterest.com/nike/ideas?source=web""https://de.pinterest.com/nike""https://de.pinterest.com/cocacola""https://de.pinterest.com/cocacola"✗ Invalid (becomes NULL):
"https://www.pinterest.com/p"NULL"pinterest.com/"NULLSpecial Note:
- • Output ALWAYS uses
de.pinterest.com(German locale) - • Input can be from any Pinterest locale (www, de, fr, etc.)
- • This standardizes to German locale for consistency
Error Handling:
- • Try-except block catches malformed URLs
- • Any exception during processing →
NULL
Fields Cleaned:
- •
company_pinterest
TikTok (clean_tiktok)
Purpose: Validate and normalize TikTok profile URLs
Detailed Algorithm:
- 1. Validation Check: URL must contain
tiktok.com/@ - 2. URL Splitting: Split URL by
tiktok.com/ - 3. Slug Extraction: Take everything after
tiktok.com/, remove trailing paths (split by/, take first part), remove query parameters (split by?, take first part) - 4. Length Validation: Slug must be at least 2 characters long (includes @)
- 5. Reconstruction: Build URL as
https://www.tiktok.com/{slug} - 6. Null on Failure: Set to
NULLif any validation fails
Examples:
✓ Valid:
"https://www.tiktok.com/@companyname?lang=en""https://www.tiktok.com/@companyname""tiktok.com/@nike/video/12345""https://www.tiktok.com/@nike""https://www.tiktok.com/@cocacola""https://www.tiktok.com/@cocacola"✗ Invalid (becomes NULL):
"https://www.tiktok.com/companyname"NULL"https://www.tiktok.com/@a"NULL"tiktok.com/@"NULLSpecial Requirements:
- • URL MUST contain
tiktok.com/@ - • The @ symbol is required and preserved in the slug
- • Without @ symbol, URL is considered invalid →
NULL
Error Handling:
- • Try-except block catches malformed URLs
- • Any exception during processing →
NULL
Fields Cleaned:
- •
company_tiktok
YouTube (clean_youtube)
Purpose: Validate and normalize YouTube channel URLs
Detailed Algorithm:
- 1. Validation Check: URL must contain
youtube.com - 2. URL Splitting: Split URL by
youtube.com/ - 3. Slug Extraction: Take everything after
youtube.com/, remove trailing paths (split by/, take first part), remove query parameters (split by?, take first part) - 4. Length Validation: Slug must be at least 2 characters long
- 5. Reconstruction: Build URL as
https://www.youtube.com/{slug} - 6. Null on Failure: Set to
NULLif any validation fails
Examples:
✓ Valid:
"https://www.youtube.com/c/CompanyChannel""https://www.youtube.com/c/CompanyChannel""youtube.com/channel/UCxxxxxx/videos""https://www.youtube.com/channel/UCxxxxxx""https://m.youtube.com/@CompanyName?feature=share""https://www.youtube.com/@CompanyName""https://www.youtube.com/user/OldUsername""https://www.youtube.com/user/OldUsername"✗ Invalid (becomes NULL):
"https://www.youtube.com/c"NULL"youtube.com/"NULL"https://vimeo.com/channel"NULLSupported YouTube URL Formats:
- •
/c/{channel-name}(custom channel URL) - •
/channel/{channel-id}(channel ID) - •
/@{handle}(new YouTube handle format) - •
/user/{username}(legacy username)
Error Handling:
- • Try-except block catches malformed URLs
- • Any exception during processing →
NULL
Fields Cleaned:
- •
company_youtube
Twitter/X (clean_twitter)
Purpose: Validate and normalize Twitter/X profile URLs
Detailed Algorithm:
- 1. Validation Check: URL must contain
x.com(NEW Twitter branding) - 2. URL Splitting: Split URL by
x.com/ - 3. Slug Extraction: Take everything after
x.com/, remove trailing paths (split by/, take first part), remove query parameters (split by?, take first part) - 4. Length Validation: Slug must be at least 2 characters long
- 5. Reconstruction: Build URL as
https://x.com/{slug} - 6. Null on Failure: Set to
NULLif any validation fails
Examples:
✓ Valid:
"https://x.com/CompanyHandle?ref_src=twsrc""https://x.com/CompanyHandle""x.com/nike/status/12345""https://x.com/nike""https://www.x.com/cocacola""https://x.com/cocacola"✗ Invalid (becomes NULL):
"https://twitter.com/CompanyHandle"NULL"https://x.com/a"NULL"x.com/"NULLIMPORTANT NOTES:
- • ONLY accepts
x.com(new Twitter branding) - • REJECTS
twitter.comURLs →NULL - • This is a strict migration to X branding
- • Old twitter.com URLs will be marked as invalid
Migration Impact:
- • Existing
twitter.comURLs in database will be marked as NULL during cleaning - • Users must provide
x.comURLs for validation to pass - • This enforces the Twitter → X rebranding
Error Handling:
- • Try-except block catches malformed URLs
- • Any exception during processing →
NULL
Fields Cleaned:
- •
company_twitter
6. Company Name Cleaning
⚠️ NOTE: Company name cleaning is NOT implemented in the current codebase.
Company names are handled through the pre-processing stage only (whitespace stripping and quote normalization).
Current Behavior:
- • Whitespace is stripped (pre-processing)
- • Quotes are normalized (pre-processing)
- • No legal form removal
- • No case transformation
- • Field passes through as-is after pre-processing
Actual vs Expected Behavior:
Actual (Current):
" Company Name GmbH ""Company Name GmbH"Expected (Not Implemented):
" Company Name GmbH ""Company Name"Fields Affected:
- •
company_name_cleaned - • All name identifiers
Validation:
None currently implemented
To Implement:
- • Remove legal form suffixes (GmbH, AG, Inc., Ltd., etc.)
- • Normalize spacing
- • Title case conversion
- • Handle compound legal forms
7-11. Other Company Fields
⚠️ NOTE: The following cleaning functions (7-11) are NOT implemented in the current codebase.
All these fields are handled through the pre-processing stage only (whitespace stripping and quote normalization).
Fields Affected:
- •
company_legal_form- No standardization or mapping - •
company_street- No normalization - •
company_city- No title case conversion - •
company_zip- No format normalization - •
company_region- No transformation - •
company_country- No ISO code conversion
- •
company_tags- No lowercase or deduplication - •
company_sources- No lowercase or deduplication - •
company_employees_research- No integer conversion - •
company_employees_linkedin- No integer conversion - •
company_founded_year- No integer conversion - •
company_linkedin_followers- No integer conversion
Current Behavior:
- • ALL these fields pass through with only whitespace stripping and quote normalization
- • No validation
- • No format transformation
- • No data normalization beyond pre-processing
Examples of Current Behavior:
Legal Form
"gmbh""gmbh""GMBH"City
" münchen ""münchen""München"ZIP Code
"10 115""10 115""10115"Tags
"Software, SAAS, software""Software, SAAS, software"["software", "saas"]Validation:
None currently implemented for any of these fields
To Implement:
- • Legal form standardization and mapping
- • Address normalization (street, city, zip, region)
- • Country code ISO conversion
- • Array parsing and deduplication
- • Integer conversion and validation
Contact Field Cleaning
The clean_contact_fields(data: dict) → dict function processes contact/lead data through multiple stages.
Pre-Processing Stage (Applied to ALL Fields)
Before any field-specific cleaning, ALL string fields undergo standardization:
1. Quote Normalization (normalize_quotes)
Same as company cleaning - converts all Unicode quote characters to standard ASCII
2. Empty Field Removal
Same as company cleaning - removes None, "", and whitespace-only fields
Note:
Contact cleaning does NOT include explicit whitespace stripping in the pre-processing loop (only quote normalization), but whitespace is still handled during field-specific cleaning.
Order of Operations:
- 1. Normalize quotes in all string fields
- 2. Apply field-specific cleaning (LinkedIn, Xing, etc.)
- 3. Remove all
None/empty fields from payload
1. Email Cleaning (clean_email)
Note:
Same algorithm as company email cleaning
Examples:
"John.Doe@Company.COM""john.doe@company.com"" info+sales@EXAMPLE.de ""info+sales@example.de"Fields Cleaned:
- •
contact_email_valid - •
contact_email_invalid - •
contact_email_catch_all - •
contact_email_unsure - • All email identifiers in respective status arrays
2. LinkedIn Profile Cleaning (clean_linkedin_profile)
Purpose: Validate and normalize LinkedIn personal profile URLs
Detailed Algorithm:
- 1. Validation Check: URL must contain
linkedin.com/in/ - 2. URL Splitting: Split URL by
linkedin.com/in/ - 3. Slug Extraction: Take everything after
linkedin.com/in/, remove trailing paths (split by/, take first part), remove query parameters (split by?, take first part) - 4. Length Validation: Slug must be at least 2 characters long
- 5. Reconstruction: Build URL as
https://www.linkedin.com/in/{slug} - 6. Null on Failure: Set to
NULLif any validation fails
Examples:
✓ Valid:
"https://www.linkedin.com/in/john-doe/""https://www.linkedin.com/in/john-doe""linkedin.com/in/jane-smith-12345678?trk=profile""https://www.linkedin.com/in/jane-smith-12345678""https://de.linkedin.com/in/max-mustermann""https://www.linkedin.com/in/max-mustermann""http://m.linkedin.com/in/person-name/details""https://www.linkedin.com/in/person-name"✗ Invalid (becomes NULL):
"https://www.linkedin.com/company/microsoft"NULL"https://www.linkedin.com/in/a"NULL"linkedin.com/in/"NULL"https://www.linkedin.com/pub/john-doe/12/345/678"NULL"https://xing.com/profile/person"NULLIMPORTANT VALIDATION RULES:
- • ONLY accepts
/in/URLs (personal profiles) - • REJECTS
/company/URLs →NULL - • REJECTS
/school/URLs →NULL - • REJECTS
/pub/URLs →NULL(old public profile format)
Edge Cases:
- • Locale Prefixes: Removed automatically (e.g.,
de.linkedin.com→www.linkedin.com) - • Mobile URLs: Handled (e.g.,
m.linkedin.com→www.linkedin.com) - • Query Parameters: All removed (e.g.,
?trk=profile,?originalSubdomain=de) - • Trailing Slashes: Removed from slug
- • Sub-pages: Removed (e.g.,
/details,/recent-activity)
Slug Validation:
- • Minimum length: 2 characters
- • Can contain: letters, numbers, hyphens, underscores
- • Supports vanity URLs (e.g.,
john-doe) and numeric IDs (e.g.,person-12345678) - • No validation of actual profile existence on LinkedIn
- • No case transformation (preserves original case)
Error Handling:
- • Try-except block catches malformed URLs
- • Any exception during processing →
NULL - • Missing parts after split →
NULL - • Empty slug after extraction →
NULL
Impact on Matching:
Without Cleaning
"https://de.linkedin.com/in/john-doe?trk=profile""linkedin.com/in/john-doe"With Cleaning
"https://www.linkedin.com/in/john-doe""https://www.linkedin.com/in/john-doe"Common Rejection Scenarios:
| Input Type | Example | Result | Reason |
|---|---|---|---|
| Company page | linkedin.com/company/microsoft | NULL | Not /in/ |
| School page | linkedin.com/school/stanford | NULL | Not /in/ |
| Public profile | linkedin.com/pub/john-doe/1/2/3 | NULL | Not /in/ |
| Short slug | linkedin.com/in/a | NULL | Slug < 2 chars |
| No slug | linkedin.com/in/ | NULL | Empty slug |
| Wrong platform | xing.com/profile/test | NULL | Not LinkedIn |
Fields Cleaned:
- •
contact_linkedin - • All LinkedIn identifiers in
contact_linkedinsarray
Name Cleaning
First Name (clean_first_name)
" john-paul ""John-Paul""marie-josé""Marie-José"Last Name (clean_last_name)
"von müller""Von Müller""o'brien""O'Brien"Fields Cleaned:
- •
contact_first_name_cleaned - •
contact_last_name_cleaned
Field Normalization
Gender Normalization
Maps variations to: male, female, diverse, unknown
Language Normalization
Converts to ISO 639-1 codes
Academic Title Normalization
Standardizes academic titles
Seniority Normalization
Standard levels: entry, mid, senior, manager, director, vp, c-level
Department Normalization
Standard departments: sales, marketing, it, hr, finance, operations, rd, customer_success
Validation and Null Replacement Rules
All Social Media & Profile URL Cleaning
Common rules that apply to ALL URL cleaning functions:
✓ Validation Rules
- 1. Minimum slug length: 2 characters
- 2. Must contain correct domain
- 3. Must match expected path pattern
- 4. Exception handling active
- 5. No empty slugs allowed
✗ Results in NULL
- • Slug shorter than 2 chars
- • Wrong platform domain
- • Wrong path pattern
- • Processing exception
- • Empty slug after extraction
Strict Platform-Specific Requirements
| Platform | Required Pattern | Rejects | Output on Invalid |
|---|---|---|---|
| Company LinkedIn | linkedin.com/company/ | /school/, /in/, /showcase/ | NULL |
| Contact LinkedIn | linkedin.com/in/ | /company/, /school/, /pub/ | NULL |
| Company Xing | xing.com/pages/ | /people/, /profile/, /companies/ | NULL |
| Contact Xing | xing.com/people/ | /pages/, /profile/ | NULL |
instagram.com | Any non-Instagram domain | NULL | |
facebook.com | Any non-Facebook domain | NULL | |
pinterest.com | Any non-Pinterest domain | NULL | |
| TikTok | tiktok.com/@ | URLs without @ symbol | NULL |
| YouTube | youtube.com | Any non-YouTube domain | NULL |
| Twitter/X | x.com | twitter.com (old domain) | NULL |
Key Insights
NULL Behavior
- • Invalid URLs become
NULL, NOT empty string - •
NULLfields are removed from payload - • No partial saves or fallback attempts
- • Failed validations are silent (no errors)
Validation Strictness
- • Twitter must be
x.com(NOT twitter.com) - • TikTok must have @ symbol
- • Xing company vs personal paths are different
- • LinkedIn company vs personal paths are different
Cleaning Impact on Matching
Why Cleaning Matters for Deduplication
Scenario 1: Company Domain Matching
Without Cleaning
"https://www.Example.COM/about""example.com"With Cleaning
"example.com""example.com"Scenario 2: LinkedIn Profile Matching
Without Cleaning
"https://de.linkedin.com/in/john-doe?trk=profile""linkedin.com/in/john-doe/"With Cleaning
"linkedin.com/in/john-doe""linkedin.com/in/john-doe"Scenario 3: Email Matching
Without Cleaning
"John.Doe@Company.COM""john.doe@company.com"With Cleaning
"john.doe@company.com""john.doe@company.com"Company Lookup Priority
- 1Domain (highest priority - most unique)
- 2LinkedIn URL
- 3Email
- 4Phone
- 5Company name + address
Contact Lookup Priority
- 1Email (highest priority - most unique)
- 2LinkedIn profile URL
- 3Xing profile URL
- 4First + Last name + Company
Data Quality Benefits
Consistency
All data stored in uniform format
Searchability
Easier to query and filter
Matching Accuracy
95%+ reduction in false negatives
Storage Efficiency
Eliminates redundant variations
API Performance
Faster comparison operations
User Experience
Predictable data format in responses
Application Scope
✓ Field Cleaning is Applied In:
- ✓All lookup operations (
company_lookup,contact_lookup) - ✓All push operations (
company_push,contact_push) - ✓All push_patch operations (
company_push_patch,contact_push_patch) - ✓CSV upload processing
✗ NOT Applied In:
- ✗GET operations (data already cleaned in database)
- ✗DELETE operations (no new data)
Order of Execution
- 1
Receive API request with raw data
Incoming payload from client application
- 2
Apply pre-processing
Whitespace stripping, quote normalization, empty field removal
- 3
Apply field-specific cleaning
Domain, LinkedIn, email, phone, social media, etc.
- 4
Remove NULL/empty fields from payload
Final cleanup before database operation
- 5
Proceed to database lookup/insert/update
Cleaned data ready for database operations
Summary
Company Cleaning Functions: 11
- 1. Domain cleaning
- 2. LinkedIn URL cleaning
- 3. Email cleaning
- 4. Phone cleaning
- 5. Instagram cleaning
- 6. Facebook cleaning
- 7. Xing cleaning
- 8. Pinterest cleaning
- 9. TikTok cleaning
- 10. YouTube cleaning
- 11. Twitter cleaning
Contact Cleaning Functions: 10
- 1. Email cleaning
- 2. LinkedIn profile cleaning
- 3. Xing profile cleaning
- 4. First name cleaning
- 5. Last name cleaning
- 6. Gender normalization
- 7. Language normalization
- 8. Academic title normalization
- 9. Position cleaning
- 10. Seniority/Department normalization
Applied In:
- • All lookup operations (company_lookup, contact_lookup)
- • All push operations (company_push, contact_push)
- • All push_patch operations (company_push_patch, contact_push_patch)
Result:
- • Consistent data format across entire database
- • Accurate deduplication and matching
- • Improved data quality and reliability
- • Better user experience with predictable outputs
API Logic
This section explains the logic flow for all API endpoints. Each subsection describes what happens when an API endpoint is called, which functions are used, and what those functions do.
Rate Limits
Comprehensive information about rate limiting, queue systems, and connection pooling for all API endpoints.
Overall API Rate Limits Analysis
Important: No Global Rate Limit
There is NO GLOBAL RATE LIMIT across all endpoints. Each endpoint has its own independent rate limit per IP address.
Complete Rate Limits by Endpoint
| Endpoint | Rate Limit | Requests/Second |
|---|---|---|
| /company_lookup | 1000/second | 1,000 |
| /company_push | 5000/second | 5,000 |
| /company_push_patch | 5000/second | 5,000 |
| /contact_lookup | 1000/second | 1,000 |
| /contact_push | 5000/second | 5,000 |
| /contact_push_patch | 5000/second | 5,000 |
| /csv_upload | 1000/second | 1,000 |
| /company_delete_fields | 1000/second | 1,000 |
| /contact_delete_fields | 1000/second | 1,000 |
| /company_get | 1000/second | 1,000 |
| /contact_get | 1000/second | 1,000 |
| /health | No limit | Unlimited |
| /redis_status | No limit | Unlimited |
| /job_status/{job_id} | No limit | Unlimited |
| /queue_stats | No limit | Unlimited |
Actual System Bottlenecks
⚠️ Critical: Rate Limiter is NOT the Bottleneck!
1. Queue System (Write Endpoints)
2. Database Connection Pool
(Avg 2 queries per request)
Maximum Requests Summary
| Measure | Single IP | System-Wide (All IPs) |
|---|---|---|
| Rate Limiter Allows | 28,000 req/sec | Unlimited |
| Queue Can Accept (writes) | 20 req/sec | 20 req/sec |
| Database Can Handle | ~750 req/sec | ~750 req/sec |
| Actual Capacity (writes) | 20 req/sec | 20 req/sec |
| Actual Capacity (reads) | 1,000 req/sec | Unlimited |
| Per Minute | Single IP | System-Wide (All IPs) |
|---|---|---|
| Rate Limiter Allows | 1,680,000 req/min | Unlimited |
| Actual Capacity (writes) | 1,200 req/min | 1,200 req/min |
| Actual Capacity (reads) | 60,000 req/min | Unlimited |
Real-World System Capacity Scenarios
Scenario 1: Read-Only Traffic (Lookups & Gets)
Endpoints:
- /company_lookup (cached 90%)
- /contact_lookup (cached 90%)
- /company_get
- /contact_get
With 90% cache hit rate:
Example: 1,000 req/sec incoming
• 900 req/sec cached (no DB needed)
• 100 req/sec hit database
• 100 × 0.08 sec = 8 concurrent connections
Bottleneck: Rate limiter (intentional throttling)
Maximum: 1,000 req/sec per IP per endpoint
System-wide: UNLIMITED (multiple IPs)
Scenario 2: Write Traffic (Push/Patch)
Endpoints:
- /company_push
- /company_push_patch
- /contact_push
- /contact_push_patch
Bottleneck: Background workers
Rate limiter allows: 5,000 req/sec per IP
System can handle: ~20 req/sec sustained
Gap: 250× higher than actual capacity!
Scenario 3: Mixed Traffic (Typical Production)
Typical production load distribution:
Example: 100 req/sec total
• 80 req/sec reads → ~8 DB connections (90% cached)
• 20 req/sec writes → All 50 workers busy
Result: System at capacity (workers saturated)
Bottleneck: Workers (20 req/sec write limit)
Key Insights
1. Rate Limiter is NOT the Bottleneck
The rate limiter is 250× HIGHER than actual capacity for write operations.
Why? Rate limiter prevents abuse, but workers prevent overload.
2. Different Limits for Different Operations
Read operations (lookups, gets):
- Rate limiter: 1,000 req/sec per IP
- Database: ~750 req/sec (with caching much higher)
- Bottleneck: Rate limiter (intentional)
Write operations (push, patch):
- Rate limiter: 5,000 req/sec per IP
- Workers: ~20 req/sec
- Bottleneck: Workers (need to scale)
3. No Global Rate Limit
Each endpoint has independent limits. A client can simultaneously:
- Send 1,000 req/sec to /company_lookup
- AND 5,000 req/sec to /company_push
- AND 1,000 req/sec to /contact_lookup
- ... all from the same IP!
Total: 28,000 req/sec from single IP (rate limiter allows)
But system will return queue_full at ~20 req/sec for writes.
Scaling Recommendations
Option 1: Add Global Rate Limit
@app.middleware("http")
async def global_rate_limit(...)Limit: 100 req/sec per IP across ALL endpoints. More realistic than 28,000 req/sec.
Option 2: Scale Workers
start_workers(num_workers=200)New capacity: ~80 req/sec (4× improvement)
Option 3: Horizontal Scaling
Deploy multiple API instances:
- Instance 1: 50 workers = 20 req/sec
- Instance 2: 50 workers = 20 req/sec
- Instance 3: 50 workers = 20 req/sec
Total: 60 req/sec capacity
Option 4: Lower Rate Limits
@limiter.limit("50/second")Match rate limits to actual capacity instead of 5000/second
Rate Limiting System (Per Endpoint)
Implementation: SlowAPI library with in-memory storage. Rate limits are applied per IP address with independent limits for each client.
| Endpoint | Rate Limit | Requests/Second | Purpose |
|---|---|---|---|
| /company_lookup | 1000/second | 1,000 | Read-only lookup |
| /company_push | 5000/second | 5,000 | Write operations |
| /company_push_patch | 5000/second | 5,000 | Write operations |
| /contact_lookup | 1000/second | 1,000 | Read-only lookup |
| /contact_push | 5000/second | 5,000 | Write operations |
| /contact_push_patch | 5000/second | 5,000 | Write operations |
| /csv_upload | 1000/second | 1,000 | Job submission |
| /company_delete_fields | 1000/second | 1,000 | Delete operations |
| /contact_delete_fields | 1000/second | 1,000 | Delete operations |
| /company_get | 1000/second | 1,000 | Read operations |
| /contact_get | 1000/second | 1,000 | Read operations |
Rate Limit Exceeded Response (HTTP 429)
{
"error": "Rate limit exceeded: 1000 per 1 second",
"detail": "Too many requests"
}Why Different Limits?
- 1,000 req/sec: Fast database reads with caching, controlled deletions, job submission only
- 5,000 req/sec: Write operations need burst capacity for bulk imports; queue system provides backpressure control
Queue System (Background Job Processing)
Max Queue Size
5,000 jobs
Background Workers
50 threads
Job Timeout
55 seconds
Queue Full Response
{
"error": "queue_full",
"message": "System at capacity, please retry in a few seconds",
"queue_size": 5001,
"retry_after_seconds": 5,
"max_queue_size": 5000
}Timeout Response
{
"error": "timeout",
"job_id": "abc-123-def-456",
"message": "Request timeout, poll /job_status/{job_id}"
}Request Flow:
- Request arrives
- Rate limiter checks: Under limit for this IP? → If NO, return 429 error
- Queue size check: Queue under 5,000? → If NO, return queue_full response
- Enqueue job with UUID
- Wait for result (55 seconds with automatic retry)
- Background worker processes job
- Return result or timeout with job_id for polling
Queue Architecture
Queue Backpressure (Prevents Overload)
The queue size check acts as backpressure. When the queue fills up (5,000 jobs), new requests are rejected with a queue_full error. This prevents memory exhaustion and maintains system stability during traffic spikes.
Queue System: Detailed Overview
System Overview
Maximum requests the API can accept
Actual throughput (DB connection limited)
The asynchronous queue system allows accepting up to 5× more requests than the system can process, providing a buffer for traffic spikes while maintaining stable processing rates.
Request Outcomes
Scenario A: Fast Processing (< 30 seconds)
Request:
POST /company_push_patch
{
"company_name": "Example Corp",
"company_domain": "example.com"
}Response (within 30s):
{
"company_main_id": "uuid-123",
"company_workspace_id": "uuid-456",
"status_company": "created",
"status_company_workspace": null,
"error": null,
"input": {
"company_name": "Example Corp",
"company_domain": "example.com"
}
}✓ Job completed immediately - feels synchronous to the client
Scenario B: Timeout (> 30 seconds)
Initial Response (after 30s timeout):
{
"error": "timeout",
"message": "Job not completed within 30 seconds",
"job_id": "abc-123-def-456",
"status": "queued",
"queue_position": 142,
"estimated_wait_seconds": 14.2,
"check_status_url": "/job_status/abc-123-def-456"
}Then poll for result:
GET /job_status/abc-123-def-456
{
"job_id": "abc-123-def-456",
"status": "completed",
"result": {
"company_main_id": "uuid-123",
"company_workspace_id": "uuid-456",
"status_company": "created",
"status_company_workspace": null,
"error": null,
"input": { "company_name": "Example Corp", "company_domain": "example.com" }
},
"metadata": {
"job_type": "company_push_patch",
"queued_at": "2025-10-11T12:00:00Z",
"started_at": "2025-10-11T12:00:30Z",
"completed_at": "2025-10-11T12:00:31Z",
"status": "completed",
"worker_id": 3
}
}⚠ Job queued - client needs to poll for result
Endpoint Classification
Queued Endpoints (5,000 req/sec)
These endpoints use the queue system:
- •
POST /company_push - •
POST /company_push_patch - •
POST /contact_push - •
POST /contact_push_patch
Direct Endpoints (1,000 req/sec)
These remain synchronous (no queue):
- •
POST /company_lookup(read-only, fast) - •
POST /contact_lookup(read-only, fast)
Performance Under Different Load Conditions
Light Load (0-1,000 req/sec)
- • All requests complete within 30s
- • No timeouts
- • Immediate responses
- • User experience: Synchronous feel
Medium Load (1,000-3,000 req/sec)
- • Most requests complete within 30s
- • Occasional timeouts for burst traffic
- • 90%+ immediate responses
- • User experience: Mostly synchronous
Heavy Load (3,000-5,000 req/sec)
- • Many timeouts (jobs queued > 30s)
- • Clients need to poll for results
- • 100% acceptance rate (no rejections)
- • User experience: Async with polling
Overload (>5,000 req/sec)
- • Rate limiter kicks in
- • Requests beyond 5,000/sec get HTTP 429
- • Still better than before (was failing at 200/sec)
- • User experience: Rate limit errors
Before vs After Queue System
| Metric | Before (Synchronous) | After (Queue System) |
|---|---|---|
| Rate limit | 1,000/sec | 5,000/sec |
| Connection pool | 20 | 120 |
| Actual capacity | ~200 req/sec | ~1,000 req/sec |
| Acceptance rate | ~200 req/sec | 5,000 req/sec |
| Failures at 1,000 req/sec | 80% | 0% |
Database Connection Pool
Min Connections
10
Always kept alive in pool
Max Connections
120
Matches Supabase Pro tier limit
Connection Pool Benefits:
- Performance: Reuses connections (no TCP handshake overhead)
- Resource Management: Limits database connections, prevents exhaustion
- Thread Safety: Multiple workers can request connections concurrently
Why 120 connections?
Supports 1,000 req/sec with avg 120ms query time: 1000 × 0.12 = 120 concurrent queries
ThreadedConnectionPool (psycopg2)
Application Layer Connection Management
Connection Lifecycle
Workers request connections via getconn(), use them for database operations, then return them via putconn(). Connections are reused, avoiding the overhead of establishing new TCP connections for each query.
Rate Limiting + Queue Interaction
Request Flow
Request arrives
Client sends HTTP request to API endpoint
✓Rate limiter check: Under 5000/sec for this IP?
✓Queue size check: Queue under 5000?
Background worker processes job
One of 50 workers picks up and executes the job
Two Layers of Protection
Rate limiting happens BEFORE queue check. This dual-layer protection prevents both abuse (rate limiter) and system overload (queue limit).
System Performance
| Component | Configuration | Purpose | Limit |
|---|---|---|---|
| Rate Limiter | 1,000-5,000 req/sec per IP | Prevent abuse | Per client IP |
| Queue | Max 5,000 jobs | Buffer requests | System-wide |
| Workers | 50 background threads | Process jobs | ~20 jobs/sec throughput |
| Connection Pool | 10-120 connections | Database access | Supabase limit: 1,000 |
| Job Timeout | 55 seconds | Prevent hanging | Railway: 60 sec |
| Retry | 1 automatic retry | Handle edge cases | 110 sec total |
Request Path
Client → Rate Limiter (1000-5000/s) → Queue Check (< 5000 jobs) → Enqueue → Worker (50 workers) → Database (120 conn) → Response
Load Scenarios
✓ Low Load (< 10 req/sec)
- • Queue Size: 0-50 jobs
- • Connection Pool: 10-20 connections in use
- • Response Time: < 1 second
○ Medium Load (100 req/sec)
- • Queue Size: 200-400 jobs
- • Connection Pool: 50-80 connections in use
- • Response Time: 4-8 seconds
⚠ High Load (500 req/sec)
- • Queue Size: 2000-4000 jobs
- • Connection Pool: 100-120 connections in use
- • Response Time: 40-80 seconds
- • Warning: Approaching queue_full threshold
🔴 Overload (1000+ req/sec sustained)
- • Queue Size: Hits 5,000 max
- • Connection Pool: 120 connections in use
- • Many requests return queue_full error
- • Action Required: Client backs off, retries after 5 seconds
Job Status Polling
Endpoint
GET /job_status/{job_id}Used when a request times out and the client needs to check the result later.
1. Queued
{
"status": "queued",
"result": null,
"queue_position": 42,
"estimated_wait_seconds": 126
}2. Processing
{
"status": "processing",
"result": null,
"queue_position": 0,
"estimated_wait_seconds": 0
}3. Completed
{
"status": "completed",
"result": {
"company_main_id": "uuid",
"company_workspace_id": "uuid",
"status_company": "created",
"status_company_workspace": "created",
"error": null
},
"queue_position": 0,
"estimated_wait_seconds": 0
}4. Failed
{
"status": "failed",
"result": {
"error": "Database connection failed"
},
"queue_position": 0,
"estimated_wait_seconds": 0
}5. Not Found
{
"status": "not_found",
"result": null,
"queue_position": 0,
"estimated_wait_seconds": 0
}Queue Statistics
Endpoint
GET /queue_statsResponse
{
"queue_size": 142,
"active_workers": 50,
"jobs_completed": 125847,
"jobs_failed": 23,
"average_processing_time_seconds": 2.3,
"queue_capacity": 5000,
"queue_utilization_percentage": 2.84
}Metrics Explanation
Health Check Indicators
✓ Healthy System
⚠ Warning State
🔴 Critical State
System Tuning Parameters
To increase system throughput, consider adjusting the following parameters:
1. Increase Workers
await queue_manager.start_workers(num_workers=100) # Was 502. Increase Connection Pool
pool = ThreadedConnectionPool(minconn=20, maxconn=200, dsn=...) # Was 10-1203. Increase Queue Size
MAX_QUEUE_SIZE = 10000 # Was 50004. Decrease Timeout
result = await queue_manager.wait_for_job(job_id, timeout=30) # Was 55Railway Deployment Considerations
Railway Timeout
Hard Limit: 60 seconds per request
Why 55-second timeout in code?
- Railway kills requests at 60 seconds
- 55-second timeout leaves 5-second buffer
- Buffer allows time to return timeout response
- Client receives job_id for polling
Automatic Retry Logic
if result.get('error') == 'timeout':
# Retry by waiting for the job again
retry_result = await queue_manager.wait_for_job(result['job_id'], timeout=55)
return retry_resultRequest Flow:
- First wait: 55 seconds → timeout
- Automatic retry: wait another 55 seconds
- Total time: up to 110 seconds
- If still timeout: Return job_id to client for polling
Why retry?
- Job might complete just after first timeout
- Gives job extra time before returning polling response
- Reduces need for client polling
Startup Configuration
🚀 FastAPI server starting... ⚡ Event loop: uvloop (2-4× faster on Linux) 📊 Database pool: 10-120 connections 🔒 Rate limit: 5000 requests/second per IP ⏱️ Queue timeout: 55 seconds (with 5s buffer before Railway's 60s timeout) 🛡️ Max queue size: 5000 (backpressure enabled)
Summary Table
| Component | Configuration | Purpose | Limit |
|---|---|---|---|
| Rate Limiter | 1,000-5,000 req/sec per IP | Prevent abuse | Per client IP |
| Queue | Max 5,000 jobs | Buffer requests | System-wide |
| Workers | 50 background threads | Process jobs | Throughput: ~20/sec |
| Connection Pool | 10-120 connections | Database access | Supabase limit: 1,000 |
| Job Timeout | 55 seconds | Prevent hanging | Railway: 60 sec |
| Retry | 1 automatic retry | Handle edge cases | 110 sec total |
Request Path
Companies
API endpoints for managing company data including lookup, retrieval, creation, updates, and field deletion.
/company_lookup
Purpose: Search for an existing company in the database using identifiers like name, domain, or LinkedIn URL.
Logic Flow
1. Receive Request Data
Accepts: company_name, company_domain, company_linkedin, workspace_id
2. Clean the Data
Calls: clean_company_fields() from processing_cleaning_funtions/company_field_cleaning.py
What it does: Normalizes the input data (removes trailing slashes, converts to lowercase, standardizes URLs)
3. Check Cache
- • Generates a unique cache key based on the input parameters
- • Checks if this lookup was done recently
- • If found in cache: Returns the cached result immediately
- • If not in cache: Continues to database lookup
4. Lookup Company
Calls: lookup_company() from processing_lookup_funtions/company_lookup.py
What it does:
- • Searches the
v_company_lookupdatabase view - • Checks if the provided name, domain, or LinkedIn matches any company
- • Uses OR logic: matches if ANY identifier matches
- • Returns FIRST match found (LIMIT 1)
- • If workspace_id is provided: Also checks if this company is connected to that workspace
Returns: company_main_id and company_workspace_id (if found)
5. Cache the Result
Saves the result to cache for 1 hour. This includes "not found" results to avoid repeated database queries.
6. Return Response
Returns: {company_main_id, company_workspace_id, error}
- company_main_id: The unique ID of the company (or null if not found)
- company_workspace_id: The ID of the workspace connection (or null if not found or no workspace_id provided)
Error Handling in API File
The process_company_lookup() function uses try/except to handle errors:
Try Block:
- • Validates that data is provided (if not: returns error "No data provided")
- • Sets connection pool for lookup module
- • Calls cleaning function to normalize data
- • Calls lookup function (which has its own internal error handling)
- • Caches the result
- • Returns formatted response
Except Block:
- • Catches any unexpected errors during the process
- • Returns response with null IDs and error message as string
- • Examples: Cleaning function fails, Cache system errors, Unexpected exceptions
Called Functions:
- •
clean_company_fields(): Normalizes company data (URLs, whitespace, domains) - •
lookup_company(): Queries database for company match, manages database connection
Match Priority
When searching with multiple identifiers (e.g., both domain and LinkedIn), the system:
- • Uses OR logic: Matches if ANY identifier matches
- • Returns the FIRST match found
- • Does NOT rank or score matches
Example:
Input: company_domain = "example.com", company_linkedin = "linkedin.com/company/different-company"Result: Returns whichever company is found first in the databaseBest Practice: Use the most specific identifier you have (domain is usually most reliable)
Error Scenarios
No identifiers provided:
{company_main_id: null, company_workspace_id: null, error: "No lookup fields provided"}Company not found:
{company_main_id: null, company_workspace_id: null, error: null}Note: This is a valid result (company doesn't exist), not an error
Database error:
{company_main_id: null, company_workspace_id: null, error: "Database query failed: [details]"}Company found but not in workspace:
{company_main_id: "uuid", company_workspace_id: null, error: null}This means: Company exists globally, but not connected to the specified workspace
Practical Examples
Request:
POST /company_lookup
{
"company_domain": "anthropic.com",
"workspace_id": "workspace-123"
}Response - Company found and in workspace:
{
"company_main_id": "f7e9a8b1-1234-5678-9abc-def123456789",
"company_workspace_id": "a1b2c3d4-5678-9abc-def1-23456789abcd",
"error": null
}Response - Company found but NOT in workspace:
{
"company_main_id": "f7e9a8b1-1234-5678-9abc-def123456789",
"company_workspace_id": null,
"error": null
}Response - Company not found:
{
"company_main_id": null,
"company_workspace_id": null,
"error": null
}People
API endpoints for managing contact/people data including lookup, retrieval, creation, updates, and field deletion.
/contact_lookup
Purpose: Search for an existing contact (person and/or lead) in the database using identifiers like email, LinkedIn, Xing, or name.
Logic Flow
1. Receive Request Data
Accepts: contact_email_valid, contact_email_catch_all, contact_email_invalid, contact_email_unsure, contact_linkedin, contact_xing, contact_first_name, contact_first_name_cleaned, contact_last_name, contact_last_name_cleaned, company_id, workspace_id
2. Clean the Data
Calls: clean_contact_fields() from processing_cleaning_funtions/contact_field_cleaning.py
What it does: Normalizes the input data (lowercase emails, standardizes URLs, trims whitespace)
3. Check Cache
- • Generates a unique cache key based on the input parameters
- • Checks if this lookup was done recently
- • If found in cache: Returns the cached result immediately
- • If not in cache: Continues to database lookup
4. Lookup Contact (Two-Phase Lookup)
Calls: lookup_contact() from processing_lookup_funtions/contact_lookup.py
Phase 1 - Lead Lookup (with company_id):
Searches for a lead (person working at a specific company)
Matches by:
- • LinkedIn + company_id
- • Xing + company_id
- • Email addresses (any type)
- • First name + Last name + company_id
Uses OR logic: Matches if ANY condition is true
If match found: Returns both lead_id and person_id, then STOPS (doesn't run Phase 2)
Phase 2 - Person Lookup (without company_id):
Only runs if Phase 1 found nothing
Searches for a person regardless of company
Matches by:
- • LinkedIn (without company requirement)
- • Xing (without company requirement)
If match found: Returns person_id only (lead_id is null)
Workspace Lookup:
If workspace_id is provided and person/lead is found:
- • Checks if the person is connected to that workspace
- • Checks if the lead is connected to that workspace
Returns: lead_id, person_id, lead_workspace_id, people_workspace_id
5. Cache the Result
Saves the result to cache for 1 hour. This includes "not found" results to avoid repeated database queries.
6. Return Response
Returns: {lead_id, person_id, lead_workspace_id, people_workspace_id, error}
- lead_id: The unique ID of the lead (or null if not found)
- person_id: The unique ID of the person (or null if not found)
- lead_workspace_id: The ID of the lead's workspace connection (or null)
- people_workspace_id: The ID of the person's workspace connection (or null)
Understanding Person vs Lead
- Person: Represents an individual with basic information (name, demographics, career history)
- Lead: Represents a person's connection to a specific company (their position, department, start date at that company)
- One person can have multiple leads (if they worked at multiple companies)
- The lookup prioritizes finding leads (person + company match) over just finding the person
Example:
- • John Smith works at Company A as CEO (Lead 1)
- • John Smith works at Company B as Advisor (Lead 2)
- • Both leads link to the same Person record (John Smith)
Match Priority
The lookup follows a strict priority order:
- 1. Email (any type) - Always checked first in Phase 1
- 2. LinkedIn + company_id - Checked in Phase 1
- 3. Xing + company_id - Checked in Phase 1
- 4. Name + company_id - Checked in Phase 1
- 5. LinkedIn alone - Only checked in Phase 2 (if Phase 1 fails)
- 6. Xing alone - Only checked in Phase 2 (if Phase 1 fails)
Why this order?
- • Emails are unique and most reliable
- • Social profiles with company context are more specific than just names
- • Phase 2 is a fallback for when we don't have company context
Returns first match: Like company lookup, returns the first match found
Error Scenarios
No identifiers provided:
{lead_id: null, person_id: null, lead_workspace_id: null, people_workspace_id: null, error: "No lookup fields available"}Contact not found:
{lead_id: null, person_id: null, lead_workspace_id: null, people_workspace_id: null, error: null}Note: This is a valid result (contact doesn't exist), not an error
Person found but not lead (Phase 2 success):
{lead_id: null, person_id: "uuid", lead_workspace_id: null, people_workspace_id: null, error: null}This means: Person exists but we don't know their connection to the specified company
Lead found but not in workspace:
{lead_id: "uuid", person_id: "uuid", lead_workspace_id: null, people_workspace_id: "uuid", error: null}This means: Lead exists, person is in workspace, but lead is not in workspace
Database error:
{lead_id: null, person_id: null, lead_workspace_id: null, people_workspace_id: null, error: "Database query failed: [details]"}Practical Examples
Example 1: Full lead match with workspace
Request:
POST /contact_lookup
{
"contact_email_valid": "john@anthropic.com",
"company_id": "company-uuid-123",
"workspace_id": "workspace-456"
}Response:
{
"lead_id": "lead-789",
"person_id": "person-abc",
"lead_workspace_id": "lead-ws-xyz",
"people_workspace_id": "people-ws-def",
"error": null
}Interpretation: John works at this company (lead found), and both the lead and person are in the workspace.
Example 2: Person found but no lead (Phase 2 success)
Request:
POST /contact_lookup
{
"contact_linkedin": "linkedin.com/in/johndoe",
"company_id": "company-uuid-999"
}Response:
{
"lead_id": null,
"person_id": "person-abc",
"lead_workspace_id": null,
"people_workspace_id": null,
"error": null
}Interpretation: John exists in database, but we don't have a record of him working at company 999. Phase 1 found nothing (no lead at that company), but Phase 2 found John by LinkedIn.
Example 3: Not found
Request:
POST /contact_lookup
{
"contact_email_valid": "unknown@example.com"
}Response:
{
"lead_id": null,
"person_id": null,
"lead_workspace_id": null,
"people_workspace_id": null,
"error": null
}Interpretation: This person doesn't exist in the database at all.
/company_get
Purpose: Retrieve complete company data including all fields and identifiers. Requires query parameters: company_id=xxx&workspace_id=yyy
Logic Flow
1. Receive Request Parameters
- Required: company_id
- Optional: workspace_id
2. Get Main Company Data
Queries: db_companies_main table
Retrieves all company fields:
- • company_name_cleaned, company_legal_form, b2b_b2c
- • Address fields (street, city, zip, region, country)
- • Registration numbers (tax, VAT, handels register)
- • Employee counts, founded year, description
- • Logo URL, LinkedIn size and followers
- • Arrays: company_tags, company_sources
- • Timestamps: created_at, updated_at
3. Get Company Identifiers
Queries: db_companies_dt_identifiers table
Retrieves all identifiers for this company
Organizes them by type into separate arrays:
- • company_names: Array of all name variations
- • company_domains: Array of all domains
- • company_emails: Array of all email addresses
- • company_phones: Array of all phone numbers
- • company_linkedins: Array of all LinkedIn URLs
- • company_instagrams: Array of all Instagram URLs
- • company_facebooks: Array of all Facebook URLs
- • company_xings: Array of all Xing URLs
- • company_pinterests: Array of all Pinterest URLs
- • company_tiktoks: Array of all TikTok URLs
- • company_youtubes: Array of all YouTube URLs
- • company_twitters: Array of all Twitter URLs
4. Get Workspace Data (if workspace_id provided)
Queries: db_companies_workspace table
Retrieves workspace-specific data:
- • company_workspace_id: The ID of the workspace connection
- • company_qualified: Qualification status in this workspace
- • company_custom_tags_ws: Custom tags for this workspace
- • Timestamps: workspace created_at, updated_at
If not provided or not found: Returns null values for workspace fields
5. Return Response
Returns: Complete company data dictionary with:
- • All main table fields
- • All identifiers organized in arrays by type
- • Workspace data (if workspace_id was provided)
- • Timestamps renamed to indicate source table (e.g., db_companies_main_created_at)
Error Handling in API File
The process_company_get() function uses try/except to handle errors:
Try Block:
- • Validates that company_id is provided (if not: returns error "company_id is required")
- • Gets database connection from pool
- • Queries
db_companies_maintable for company data - • If company not found: Returns error "Company not found"
- • Queries
db_companies_dt_identifiersand organizes by type - • If workspace_id provided: Queries
db_companies_workspacetable - • Returns complete company data dictionary
Except Block:
- • Catches any unexpected errors during database queries or data processing
- • Returns error response with error message
- • Examples: Database connection fails, Query execution errors, Data formatting errors
Note:
No cleaning functions are called (get endpoints don't need cleaning). Database connection is managed internally.
Why Use Company Get?
- Company Lookup tells you IF a company exists and returns just the IDs
- Company Get gives you ALL the information about a company once you know its ID
- Use lookup first to find the company, then use get to retrieve all details
Error Scenarios
Company doesn't exist:
{"error": "Company not found", "company_id": "xxx"}This happens when the company_id doesn't exist in the database
Company exists but not in workspace:
Returns: Complete company data with workspace fields as null
company_workspace_id: null, company_qualified: null, company_custom_tags_ws: nullDatabase error:
{"error": "Database error: [error details]", "company_id": "xxx"}Empty identifier arrays:
Always returns arrays (even if empty): company_domains: []
This is normal - not all companies have all types of identifiers
Practical Example
Request:
GET /company_get?company_id=f7e9a8b1-1234-5678-9abc-def123456789&workspace_id=workspace-123Response:
{
"company_name_cleaned": "Anthropic",
"company_legal_form": "Inc",
"b2b_b2c": "B2B",
"company_name_imprint": null,
"company_street": "123 Main St",
"company_street_nr": "123",
"company_city": "San Francisco",
"company_zip": "94105",
"company_region": "California",
"company_country": "USA",
"company_tax_nr": null,
"company_vat_nr": null,
"company_handels_register": null,
"company_employees_research": "100-500",
"company_employees_linkedin": "250",
"company_founded_year": "2021",
"company_description": "AI safety and research company",
"company_logo_url": "https://...",
"company_size_linkedin": "201-500",
"company_linkedin_followers": "50000",
"company_tags": ["AI", "Research", "Safety"],
"company_sources": ["LinkedIn", "Website"],
"db_companies_main_created_at": "2024-01-15T10:30:00Z",
"db_companies_main_updated_at": "2024-03-20T14:45:00Z",
"company_names": ["Anthropic", "Anthropic PBC"],
"company_domains": ["anthropic.com", "www.anthropic.com"],
"company_emails": ["info@anthropic.com", "contact@anthropic.com"],
"company_phones": ["+1-555-0100"],
"company_linkedins": ["linkedin.com/company/anthropic"],
"company_instagrams": [],
"company_facebooks": [],
"company_xings": [],
"company_pinterests": [],
"company_tiktoks": [],
"company_youtubes": [],
"company_twitters": ["twitter.com/anthropicai"],
"company_workspace_id": "ws-abc-123",
"company_qualified": "high",
"company_custom_tags_ws": ["hot-lead", "enterprise"],
"db_companies_workspace_created_at": "2024-02-01T09:00:00Z",
"db_companies_workspace_updated_at": "2024-03-15T16:20:00Z"
}/contact_get
Purpose: Retrieve complete contact data including person information, lead information, and all identifiers. Requires query parameters: lead_id=xxx&workspace_id=yyy
Logic Flow
1. Receive Request Parameters
- Required: lead_id
- Optional: workspace_id
2. Get Lead Data
Queries: db_leads table
Retrieves all lead fields:
- people_id: Reference to the person record
- companies_main_id: Reference to the company
- lead_position, lead_position_cleaned
- lead_seniority, lead_departement
- still_at_company: Boolean indicating if person still works there
- lead_start_date, lead_end_date: Employment dates
- lead_seniority_enum, lead_departement_enum: Standardized values
- Position variations (plural forms)
- lead_summary: Summary of the lead
- lead_sources: Array of data sources
- Timestamps: created_at, updated_at
Extracts people_id to retrieve person data
3. Get Person Data
Queries: db_people table using the people_id
Retrieves all person fields:
- Name fields (first name, last name, cleaned versions)
- person_gender, person_language
- Birth information (year, date, estimated year)
- Location (country, city, state)
- LinkedIn CV and volunteering experience
- Career dates (education start, first job start)
- contact_location, contact_academic_title
- person_native_german, person_scooling_country
- LinkedIn profile image URL
- LinkedIn followers and connections count
- Timestamps: created_at, updated_at
4. Get People Identifiers
Queries: db_people_identifiers table
Retrieves social profile identifiers
Organizes them into arrays:
- contact_linkedins: Array of all LinkedIn URLs
- contact_xings: Array of all Xing URLs
5. Get Lead Identifiers
Queries: db_leads_identifiers table
Retrieves email identifiers with their validation status
Organizes them by status into separate arrays:
- contact_emails_valid: Array of validated emails
- contact_emails_invalid: Array of invalid emails
- contact_emails_catch_all: Array of catch-all emails
- contact_emails_wrong: Array of wrong emails
- contact_emails_unsure: Array of emails with unsure status
6. Get Workspace Data (if workspace_id provided)
Queries: db_leads_workspace table
Retrieves workspace-specific lead data:
- lead_workspace_id: The ID of the workspace connection
- lead_qualified_ws: Qualification status in this workspace
- Timestamps: workspace created_at, updated_at
If not provided or not found: Returns null values for workspace fields
7. Return Response
Returns: Complete contact data dictionary with:
- All lead fields
- All person fields
- People identifiers in arrays
- Lead identifiers organized by email status
- Workspace data (if workspace_id was provided)
- Timestamps renamed to indicate source table
Error Handling in API File
The process_contact_get() function in api_contact_get.py uses try/except to handle errors:
Try Block:
- Validates that lead_id is provided (if not: returns error "lead_id is required")
- Gets database connection from pool
- Queries
db_leadstable for lead data - If lead not found: Returns error "Lead not found"
- Extracts people_id from lead data
- Queries
db_peopletable using people_id - Queries
db_people_identifiersand organizes by type - Queries
db_leads_identifiersand organizes by email status - If workspace_id provided: Queries
db_leads_workspacetable - Returns complete contact data dictionary
Except Block:
- Catches any unexpected errors during database queries or data processing
- Returns error response with error message
- Examples of caught errors:
- Database connection fails
- Query execution errors
- Data formatting errors
Note: No cleaning functions are called (get endpoints don't need cleaning). Database connection is managed internally.
Why Use Contact Get?
- Contact Lookup tells you IF a contact exists and returns just the IDs
- Contact Get gives you ALL the information about a contact once you know the lead_id
- Use lookup first to find the contact, then use get to retrieve all details
- Contact Get requires lead_id (not person_id) because it's designed to get the full context of a person's role at a specific company
Why Lead ID and Not Person ID?
Contact Get requires a lead_id because:
- It's designed to show a person in the context of a specific company
- One person can work at multiple companies (multiple leads)
- If you used person_id, the API wouldn't know which company context to show
Example:
- John Smith (person) works at Company A as CEO (lead 1)
- John Smith (same person) works at Company B as Advisor (lead 2)
- Calling contact_get with lead 1's ID shows John's CEO role at Company A
- Calling contact_get with lead 2's ID shows John's Advisor role at Company B
Error Handling
Scenario: Lead doesn't exist
{"error": "Lead not found", "lead_id": "xxx"}This happens when the lead_id doesn't exist in the database
Scenario: Lead exists but person data missing
Complete lead data with null values for all person fieldsThis is rare but can happen if data integrity issues exist
Scenario: Lead exists but not in workspace
Complete contact data with workspace fields as nullExample: lead_workspace_id: null, lead_qualified_ws: null
Scenario: Database error
{"error": "Database error: [error details]", "lead_id": "xxx"}Scenario: Empty identifier arrays
contact_linkedins: []Always returns arrays (even if empty) - This is normal; not all contacts have all types of identifiers
Practical Example
Request:
GET /contact_get?lead_id=lead-789-xyz&workspace_id=workspace-456Response:
{
"people_id": "person-abc-123",
"companies_main_id": "company-xyz-789",
"lead_position": "Chief Executive Officer",
"lead_position_cleaned": "CEO",
"lead_seniority": "C-Level",
"lead_departement": "Executive",
"still_at_company": true,
"lead_start_date": "2020-01-01",
"lead_end_date": null,
"lead_seniority_enum": "c_level",
"lead_departement_enum": "executive",
"lead_position_clean_plural_dativ": "CEOs",
"lead_position_clean_plural_nominativ": "CEOs",
"lead_summary": "Experienced executive in AI industry",
"lead_sources": ["LinkedIn", "Company Website"],
"db_leads_created_at": "2024-01-10T11:00:00Z",
"db_leads_updated_at": "2024-03-18T15:30:00Z",
"person_first_name": "John",
"contact_first_name_cleaned": "John",
"person_last_name": "Smith",
"contact_last_name_cleaned": "Smith",
"person_gender": "male",
"person_language": "English",
"contact_estimated_birth_year": "1980",
"contact_birth_year": "1980",
"contact_birth_date": "1980-05-15",
"person_country": "USA",
"person_city": "San Francisco",
"linkedin_cv": "Detailed career history...",
"linkedin_volunteerings": "Board member at...",
"started_education_linkedin": "1998",
"first_job_start_linkedin": "2002",
"contact_location": "San Francisco, CA",
"contact_academic_title": "PhD",
"person_state": "California",
"person_native_german": false,
"person_scooling_country": "USA",
"contact_linkedin_image_url": "https://...",
"person_linkedin_followers": "5000",
"person_linkedin_connections": "500+",
"db_people_created_at": "2024-01-05T09:00:00Z",
"db_people_updated_at": "2024-03-10T14:00:00Z",
"contact_linkedins": [
"linkedin.com/in/johnsmith",
"linkedin.com/in/john-smith-ceo"
],
"contact_xings": [],
"contact_emails_valid": [
"john@company.com",
"john.smith@company.com"
],
"contact_emails_invalid": [],
"contact_emails_catch_all": [
"info@company.com"
],
"contact_emails_wrong": [],
"contact_emails_unsure": [
"j.smith@company.com"
],
"lead_workspace_id": "lead-ws-xyz-123",
"lead_qualified_ws": "qualified",
"db_leads_workspace_created_at": "2024-02-05T10:00:00Z",
"db_leads_workspace_updated_at": "2024-03-12T16:45:00Z"
}/company_push
Purpose: Look up company and automatically create it if not found. Also creates workspace connection if workspace_id is provided and connection doesn't exist.
Logic Flow
1. Receive Request Data
Accepts: All company fields, workspace_id
2. Set Connection Pools
Sets connection pool for: company_lookup, company_push, company_workspace_push
3. Clean the Data
Calls: clean_company_fields() from processing_cleaning_funtions/company_field_cleaning.py
What it does: Normalizes company data
4. Lookup Company
Calls: lookup_company() from processing_lookup_funtions/company_lookup.py
NO CACHING - Always fresh lookup to determine correct status
Returns: company_main_id and company_workspace_id (if found)
5. Determine Company Status
If company NOT found:
- Calls:
push_company()fromprocessing_push_funtions/company_push.py - What it does: Creates new company record in db_companies_main and identifiers in db_companies_dt_identifiers
- Sets status_company = "created"
- Updates company_main_id with newly created ID
If company found:
- Sets status_company = "found"
- No push operation needed
6. Determine Workspace Status (if workspace_id provided and company exists)
If workspace connection NOT found:
- Calls:
push_company_workspace()fromprocessing_push_funtions/company_workspace_push.py - What it does: Creates workspace connection in db_companies_workspace
- Sets status_company_workspace = "created"
- Updates company_workspace_id with newly created ID
If workspace connection found:
- Sets status_company_workspace = "found"
- No push operation needed
7. Return Response
Returns: {company_main_id, company_workspace_id, status_company, status_company_workspace, error}
- status_company: "found" or "created"
- status_company_workspace: "found" or "created" (or null if no workspace_id provided)
Error Handling in API File
The process_company_push() function handles errors at multiple stages:
Validation:
- If no data provided: Returns error "No data provided"
Lookup Error:
- If lookup_company() returns error: Returns error from lookup function
Push Errors:
- If company_push() fails: Returns error "company_push failed" with details
- If company_workspace_push() fails: Returns company_main_id and status_company, but error for workspace operation
Called Functions (Brief Description):
clean_company_fields(): Normalizes company datalookup_company(): Searches database for companypush_company(): Creates new company record with identifiers, returns company_idpush_company_workspace(): Creates workspace connection, returns workspace connection id
/contact_push
Purpose: Look up contact and automatically create person, lead, and workspace connections if not found.
Logic Flow
1. Receive Request Data
Accepts: All contact fields, company_id, workspace_id
2. Set Connection Pools
Sets connection pool for: contact_lookup, people_push, lead_push, people_workspace_push, lead_workspace_push
3. Extract IDs
Extracts company_id and workspace_id from data
4. Clean the Data
Calls: clean_contact_fields() from processing_cleaning_funtions/contact_field_cleaning.py
What it does: Normalizes contact data
5. Lookup Contact
Calls: lookup_contact() from processing_lookup_funtions/contact_lookup.py
NO CACHING - Always fresh lookup
Returns: lead_id, person_id, lead_workspace_id, people_workspace_id
6. Determine Person Status
If person NOT found:
- Calls:
create_person_record()fromprocessing_push_funtions/people_push.py - What it does: Creates new person record in db_people and identifiers in db_people_identifiers
- Sets status_person = "created"
If person found:
- Sets status_person = "found"
7. Determine Lead Status (if person_id and company_id exist)
If lead NOT found:
- Calls:
create_lead_record()fromprocessing_push_funtions/lead_push.py - What it does: Creates new lead record in db_leads connecting person to company, creates email identifiers in db_leads_identifiers
- Sets status_lead = "created"
If lead found:
- Sets status_lead = "found"
8. Determine People Workspace Status (if workspace_id and person_id exist)
If people workspace NOT found:
- Calls:
create_people_workspace_connection()fromprocessing_push_funtions/people_workspace_push.py - What it does: Creates workspace connection in db_people_workspace
- Sets status_people_workspace = "created"
If people workspace found:
- Sets status_people_workspace = "found"
9. Determine Lead Workspace Status (if workspace_id and lead_id exist)
If lead workspace NOT found:
- Calls:
create_lead_workspace_connection()fromprocessing_push_funtions/lead_workspace_push.py - What it does: Creates workspace connection in db_leads_workspace
- Sets status_lead_workspace = "created"
If lead workspace found:
- Sets status_lead_workspace = "found"
10. Return Response
Returns: {person_id, lead_id, people_workspace_id, lead_workspace_id, status_person, status_lead, status_people_workspace, status_lead_workspace, error}
All status values: "found" or "created"
Error Handling in API File
The process_contact_push() function handles errors at multiple stages:
Validation:
- If no data provided: Returns error "No data provided"
Lookup Error:
- If lookup_contact() returns error: Returns all null IDs with error
Push Errors (cascading returns):
- If people_push fails: Returns error immediately
- If lead_push fails: Returns person_id and status_person, but error for lead
- If people_workspace_push fails: Returns person_id, lead_id, and statuses, but error for people workspace
- If lead_workspace_push fails: Returns all IDs and statuses except lead workspace
Called Functions (Brief Description):
clean_contact_fields(): Normalizes contact datalookup_contact(): Searches database for contact using two-phase lookupcreate_person_record(): Creates person in db_people with identifierscreate_lead_record(): Creates lead in db_leads with email identifierscreate_people_workspace_connection(): Creates workspace link for personcreate_lead_workspace_connection(): Creates workspace link for lead
/company_push_patch
Purpose: Look up company and CREATE if not found OR UPDATE if found. Same for workspace connections.
Logic Flow
1. Receive Request Data
Accepts: All company fields, workspace_id
2. Set Connection Pools
Sets connection pool for: company_lookup, company_push, company_workspace_push, company_update, company_workspace_update
3. Clean the Data
Calls: clean_company_fields()
What it does: Normalizes company data
4. Lookup Company
Calls: lookup_company()
NO CACHING - Always fresh lookup
5. Determine Company Status and Action
If company NOT found:
- Calls:
push_company()fromprocessing_push_funtions/company_push.py - What it does: Creates new company record
- Sets status_company = "created"
If company found:
- Calls:
update_company_identifiers()fromprocessing_update_funtions/company_update.py - What it does: Updates existing company fields and adds new identifiers (non-destructive)
- Sets status_company = "updated"
6. Determine Workspace Status and Action (if workspace_id and company_main_id exist)
If workspace connection NOT found:
- Calls:
push_company_workspace()fromprocessing_push_funtions/company_workspace_push.py - What it does: Creates workspace connection
- Sets status_company_workspace = "created"
If workspace connection found:
- Calls:
update_company_workspace()fromprocessing_update_funtions/company_workspace_update.py - What it does: Updates workspace connection fields (non-destructive)
- Sets status_company_workspace = "updated"
7. Return Response
Returns: {company_main_id, company_workspace_id, status_company, status_company_workspace, error}
- status_company: "created" or "updated"
- status_company_workspace: "created" or "updated"
Error Handling in API File
The process_company_push_patch() function handles errors at multiple stages:
Validation:
- If no data provided: Returns error "No data provided"
Lookup Error:
- If lookup_company() returns error: Returns error from lookup
Push/Update Errors:
- If company_push() fails: Returns error "company_push failed"
- If company_update() fails: Returns company_main_id but error for update
- If company_workspace_push() fails: Returns company info but error for workspace
- If company_workspace_update() fails: Returns company info but error for workspace update
Called Functions (Brief Description):
clean_company_fields(): Normalizes company datalookup_company(): Searches database for companypush_company(): Creates new company with identifiersupdate_company_identifiers(): Updates company fields and adds new identifiers without removing existing datapush_company_workspace(): Creates workspace connectionupdate_company_workspace(): Updates workspace fields without removing existing data
/contact_push_patch
Purpose: Look up contact and CREATE if not found OR UPDATE if found for person, lead, and workspace connections.
Logic Flow
1. Receive Request Data
Accepts: All contact fields, company_id, workspace_id
2. Set Connection Pools
Sets connection pool for: contact_lookup, people_push, lead_push, people_workspace_push, lead_workspace_push, people_update, lead_update, people_workspace_update, lead_workspace_update
3. Extract IDs and Clean Data
Extracts company_id and workspace_id
Calls: clean_contact_fields()
4. Lookup Contact
Calls: lookup_contact()
NO CACHING
5. Determine Person Status and Action
If person NOT found:
- Calls:
create_person_record()fromprocessing_push_funtions/people_push.py - Sets status_person = "created"
If person found:
- Calls:
update_person_identifiers()fromprocessing_update_funtions/people_update.py - What it does: Updates person fields and adds new identifiers (non-destructive)
- Sets status_person = "updated"
6. Determine Lead Status and Action (if person_id and company_id exist)
If lead NOT found:
- Calls:
create_lead_record()fromprocessing_push_funtions/lead_push.py - Sets status_lead = "created"
If lead found:
- Calls:
update_lead_identifiers()fromprocessing_update_funtions/lead_update.py - What it does: Updates lead fields and adds new email identifiers (non-destructive)
- Sets status_lead = "updated"
7. Determine People Workspace Status and Action (if workspace_id and person_id exist)
If people workspace NOT found:
- Calls:
create_people_workspace_connection()fromprocessing_push_funtions/people_workspace_push.py - Sets status_people_workspace = "created"
If people workspace found:
- Calls:
update_people_workspace()fromprocessing_update_funtions/people_workspace_update.py - What it does: Updates workspace fields (non-destructive)
- Sets status_people_workspace = "updated"
8. Determine Lead Workspace Status and Action (if workspace_id and lead_id exist)
If lead workspace NOT found:
- Calls:
create_lead_workspace_connection()fromprocessing_push_funtions/lead_workspace_push.py - Sets status_lead_workspace = "created"
If lead workspace found:
- Calls:
update_lead_workspace()fromprocessing_update_funtions/lead_workspace_update.py - What it does: Updates workspace qualification status
- Sets status_lead_workspace = "updated"
9. Return Response
Returns: {person_id, lead_id, people_workspace_id, lead_workspace_id, status_person, status_lead, status_people_workspace, status_lead_workspace, error}
All status values: "created" or "updated"
Error Handling in API File
The process_contact_push_patch() function handles errors at multiple stages:
Validation:
- If no data provided: Returns error "No data provided"
Lookup Error:
- If lookup_contact() returns error: Returns all null IDs with error
Push/Update Errors (cascading returns):
- If people_push/update fails: Returns error immediately
- If lead_push/update fails: Returns person info but error for lead
- If people_workspace_push/update fails: Returns person and lead info but error for people workspace
- If lead_workspace_push/update fails: Returns all info except lead workspace
Called Functions (Brief Description):
clean_contact_fields(): Normalizes contact datalookup_contact(): Searches database for contactcreate_person_record(): Creates person with identifiersupdate_person_identifiers(): Updates person fields and adds new identifierscreate_lead_record(): Creates lead with email identifiersupdate_lead_identifiers(): Updates lead fields and adds new email identifierscreate_people_workspace_connection(): Creates workspace link for personupdate_people_workspace(): Updates people workspace fieldscreate_lead_workspace_connection(): Creates workspace link for leadupdate_lead_workspace(): Updates lead workspace qualification status
/company_delete_fields
Purpose: Delete specific fields from company records in db_companies_main, db_companies_dt_identifiers, and db_companies_workspace.
Logic Flow
1. Receive Request Data
- Required: company_id
- Optional: Fields to delete (each field as boolean true or with value)
2. Validate Data
- Checks that company_id is provided
- Returns error if company_id missing
3. Get Database Connection
- Gets connection from pool
- Creates cursor for queries
4. Part 1: Handle Boolean Fields
- For fields marked as
truein request - Sets fields to NULL in db_companies_main
- Fields: company_name_cleaned, legal_form, address fields, registration numbers, employee counts, etc.
- Executes:
UPDATE db_companies_main SET field = NULL WHERE id = company_id
5. Part 2: Handle Array Fields
- For array fields with values (company_tags, company_sources)
- Parses comma-separated values
- Removes each item from array
- Executes:
UPDATE db_companies_main SET field = array_remove(field, item) WHERE id = company_id
6. Part 3: Handle Identifier Fields
- For identifier fields with values (name, domain, linkedin, emails, phones, social media)
- Deletes matching records from db_companies_dt_identifiers
- Executes:
DELETE FROM db_companies_dt_identifiers WHERE companies_main_id = company_id AND identifier = value AND type = type
7. Part 4: Handle Workspace Fields (if company_workspace_id provided)
- For company_qualified field: Sets to NULL
- For company_custom_tags_ws: Removes items from array
- Executes on db_companies_workspace table
8. Commit and Return
- Commits all changes
- Returns success with list of operations performed
Error Handling in API File
The process_company_delete_fields() function uses try/except with rollback:
Try Block:
- Validates company_id is provided
- Gets database connection from pool
- Executes multiple UPDATE and DELETE queries
- Tracks operations performed
- Commits transaction
Except Block - Database Errors:
- Catches psycopg2.Error (database-specific errors)
- Rolls back transaction if error occurs
- Closes cursor if open
- Releases connection back to pool
- Returns:
(False, "Database error: [error details]")
Except Block - General Errors:
- Catches any other Exception
- Rolls back transaction if error occurs
- Closes cursor if open
- Releases connection back to pool
- Returns:
(False, "Error: [error details]")
Finally Logic (implicit in except blocks):
- Always attempts to close cursor
- Always attempts to rollback on error
- Always releases connection back to pool
No Called Processing Functions: This file directly executes database queries without calling separate processing modules.
/contact_delete_fields
Purpose: Delete specific fields from contact records in db_people, db_leads, db_people_identifiers, db_leads_identifiers, and workspace tables.
Logic Flow
1. Receive Request Data
- Required: At least one of people_id or lead_id
- Optional: people_workspace_id, leads_workspace_id, fields to delete
2. Validate Data
- Checks that at least people_id OR lead_id is provided
- Returns error if both missing
3. Get Database Connection
- Gets connection from pool
- Creates cursor for queries
4. Part 1: Handle People Boolean Fields (if people_id provided)
- For fields marked as
truein request - Sets fields to NULL in db_people
- Fields: name, gender, language, birth info, location, LinkedIn data, etc.
- Executes:
UPDATE db_people SET field = NULL WHERE id = people_id
5. Part 2: Handle Lead Boolean Fields (if lead_id provided)
- For fields marked as
truein request - Sets fields to NULL in db_leads
- Fields: position, seniority, department, dates, summary, etc.
- Executes:
UPDATE db_leads SET field = NULL WHERE id = lead_id
6. Part 3: Handle Lead Array Fields (if lead_id provided)
- For lead_sources array field
- Parses comma-separated values
- Removes each item from array
- Executes:
UPDATE db_leads SET lead_sources = array_remove(lead_sources, item) WHERE id = lead_id
7. Part 4: Handle People Identifiers (if people_id provided)
- For identifier fields with values (contact_linkedin, contact_xing)
- Deletes matching records from db_people_identifiers
- Executes:
DELETE FROM db_people_identifiers WHERE people_id = people_id AND contact_ident_identifier = value AND contact_ident_type = type
8. Part 5: Handle Lead Identifiers (if lead_id provided)
- For email identifier fields with values (valid, catch_all, invalid, unsure)
- Deletes matching records from db_leads_identifiers
- Executes:
DELETE FROM db_leads_identifiers WHERE leads_id = lead_id AND lead_ident_identifier = email AND lead_ident_type = 'email' AND lead_ident_status = status
9. Part 6: Handle People Workspace (if people_workspace_id provided)
- Note: db_people_workspace only contains IDs, no other fields to delete
- Logs operation performed
10. Part 7: Handle Leads Workspace (if leads_workspace_id provided)
- For lead_qualified_ws field: Sets to NULL
- Executes:
UPDATE db_leads_workspace SET lead_qualified_ws = NULL WHERE id = leads_workspace_id
11. Commit and Return
- Commits all changes
- Returns success with list of operations performed
Error Handling in API File
The process_contact_delete_fields() function uses try/except with rollback:
Try Block:
- Validates at least one ID is provided
- Gets database connection from pool
- Executes multiple UPDATE and DELETE queries across multiple tables
- Tracks operations performed
- Commits transaction
Except Block - Database Errors:
- Catches psycopg2.Error (database-specific errors)
- Rolls back transaction if error occurs
- Closes cursor if open
- Releases connection back to pool
- Returns:
(False, "Database error: [error details]")
Except Block - General Errors:
- Catches any other Exception
- Rolls back transaction if error occurs
- Closes cursor if open
- Releases connection back to pool
- Returns:
(False, "Error: [error details]")
Finally Logic (implicit in except blocks):
- Always attempts to close cursor
- Always attempts to rollback on error
- Always releases connection back to pool
No Called Processing Functions: This file directly executes database queries without calling separate processing modules.
Key Differences Between Endpoints
Push vs Push/Patch
Push Endpoints
- Found: Returns "found" status, NO update
- Not Found: Creates new record, returns "created" status
Push/Patch Endpoints
- Found: Updates existing record, returns "updated" status
- Not Found: Creates new record, returns "created" status
Update Behavior (Push/Patch only)
All update functions are non-destructive:
- Existing fields: Keep current values
- New fields: Add new values
- Arrays: Append new items (don't remove existing)
- Identifiers: Add new identifiers (don't remove existing)
Delete Behavior
Delete endpoints are selective:
- Boolean fields with value
true: Set to NULL - Array fields with values: Remove only specified items
- Identifier fields with values: Delete only specified identifiers
- Uses explicit DELETE queries for identifier tables
Error Handling Patterns
Push/Push-Patch Files
Pattern: Cascading returns with partial success
- If early operation fails: Return error immediately
- If later operation fails: Return successful IDs but error for failed operation
Example:
If person created but lead fails, return person_id and status_person, but error for lead
Delete Files
Pattern: All-or-nothing transaction with rollback
- If any operation fails: Rollback entire transaction
- Return success only if all operations commit
- Uses explicit transaction management
Connection Management
All Files:
- Get connection from shared pool
- Always release connection back to pool (even on error)
- Use try/except/finally pattern for cleanup
Summary: Lookup vs Get
Lookup Endpoints (Search)
- Purpose: Find if something exists
- Input: Identifiers (name, domain, email, etc.)
- Output: IDs only
- Use Case: "Does this company/contact exist in our database?"
- Caching: Yes (1 hour)
- Returns null when not found: This is normal, not an error
Get Endpoints (Retrieve)
- Purpose: Get complete information
- Input: ID (company_id or lead_id)
- Output: All data fields, all identifiers, workspace connections
- Use Case: "Give me everything you know about this company/contact"
- Caching: No
- Returns error when not found: ID doesn't exist in database
Typical Workflows
Workflow 1: Check if company exists, then get details
1. POST /company_lookup with domain="anthropic.com"
2. Response: {company_main_id: "uuid-123", ...}
3. GET /company_get?company_id=uuid-123
4. Response: {complete company data...}Workflow 2: Search for contact, then get full info
1. POST /contact_lookup with email="john@company.com" and company_id="company-uuid"
2. Response: {lead_id: "lead-456", person_id: "person-789", ...}
3. GET /contact_get?lead_id=lead-456
4. Response: {complete contact data including lead and person info...}Workflow 3: Check workspace membership
1. POST /company_lookup with domain="example.com" and workspace_id="ws-123"
2. Response: {company_main_id: "uuid-123", company_workspace_id: null, ...}
3. Interpretation: Company exists but is NOT in workspace ws-123
4. Next step: Use company_push to add it to the workspaceCommon Questions
Q: When should I use workspace_id?
Use workspace_id in lookups when:
- You want to check if a company/contact is already in a specific workspace
- You're building workspace-specific features (like workspace dashboards)
Use workspace_id in get calls when:
- You want workspace-specific data (qualification status, custom tags)
- You're displaying data in a workspace context
Don't use workspace_id when:
- You're doing a global search across all workspaces
- You want to find all instances of a company/contact regardless of workspace
Q: Why does contact lookup return person_id without lead_id sometimes?
This happens in Phase 2 of contact lookup:
- You searched for a person (by LinkedIn/Xing) without providing company_id, OR
- You provided company_id but the person doesn't work at that company
- The system found the person in the database but couldn't find a lead (company connection)
This means: "We know this person exists, but we don't have employment data for the company you specified"
Q: What's the difference between company_name and company_name_cleaned?
- company_name: Stored in
db_companies_dt_identifiers- can have multiple variations (old names, alternate spellings) - company_name_cleaned: Stored in
db_companies_main- the current, standardized company name
Example:
- company_names array: ["Facebook", "Facebook Inc", "Meta"]
- company_name_cleaned: "Meta Platforms Inc"
Q: Why are identifiers in separate arrays by type?
For historical tracking:
- Company domains change (rebrandings, mergers)
- People change names (marriage, legal name changes)
- Email addresses change (job changes)
For matching flexibility:
- Search with old domain, find company by current domain
- Search with old email, find person by current email
- Match against ANY stored identifier
Q: Can I search for a person without knowing their company?
Yes, in contact_lookup:
- Don't provide company_id
- Provide LinkedIn or Xing URL
- The lookup will run Phase 2 (person-only search)
- You'll get person_id but not lead_id
This means: "Found the person, but don't know where they work"
Q: What happens if I provide multiple identifiers that match different companies?
The system returns the first match found:
- Example: domain="company-a.com", linkedin="linkedin.com/company/company-b"
- If company A is found first in database, it returns company A
- No ranking or scoring is performed
Best practice: Use the most specific/reliable identifier you have
Database Tables Reference
Company Tables
v_company_lookupOptimized view for searching companies (contains arrays of identifiers)
db_companies_mainMain company data (single record per company)
db_companies_dt_identifiersCompany identifiers (multiple records per company, one per identifier)
db_companies_workspaceWorkspace connections (one record per company-workspace pair)
Contact Tables
v_contact_lookupOptimized view for searching contacts (contains arrays of identifiers)
db_leadsLead data - person at company (one record per person-company relationship)
db_peoplePerson data - individual information (single record per person)
db_people_identifiersPerson identifiers like LinkedIn/Xing (multiple records per person)
db_leads_identifiersLead identifiers like emails (multiple records per lead)
db_leads_workspaceLead workspace connections (one record per lead-workspace pair)
db_people_workspacePerson workspace connections (one record per person-workspace pair)
Table Relationships
Company Structure:
db_companies_main (1) ←→ (many) db_companies_dt_identifiers
(1) ←→ (many) db_companies_workspaceContact Structure:
db_people (1) ←→ (many) db_people_identifiers
(1) ←→ (many) db_people_workspace
(1) ←→ (many) db_leads
db_leads (1) ←→ (many) db_leads_identifiers
(1) ←→ (many) db_leads_workspace
(many) ←→ (1) db_companies_mainKey Points:
- One person can have many leads (worked at multiple companies)
- One lead belongs to one person and one company
- Identifiers are stored separately for historical tracking
- Workspace connections are separate for each entity
Functions
Processing functions used across API endpoints for data cleaning, validation, and normalization.
This document explains the logic flow for all processing functions organized by functional area. Each section describes what each function does, its complete logic flow, database operations, error handling, and return values.
1. Processing Cleaning Functions
1.1 Contact Field Cleaning (contact_field_cleaning.py)
Purpose:
Normalizes and validates contact/lead field data before database operations.
normalize_quotes(text: str) → str
Purpose:
Converts all Unicode quote characters to standard ASCII quotes.
Logic Flow:
- 1. Check if input is string type, return as-is if not
- 2. Replace all smart quotes and quote-like characters:
- • Right single quotation mark (U+2019) → apostrophe
- • Left single quotation mark (U+2018) → apostrophe
- • Grave accent/backtick (U+0060) → apostrophe
- • Acute accent (U+00B4) → apostrophe
- • Left double quotation mark (U+201C) → double quote
- • Right double quotation mark (U+201D) → double quote
- 3. Return normalized text
Database Operations:
None
Return Value:
Normalized string with standard ASCII quotes
clean_contact_fields(contact_data: dict) → dict
Purpose:
Validates and normalizes all contact fields including social media URLs and removes empty values.
Logic Flow:
- 1. Normalize Quotes in All String Fields - Iterate through all fields and call normalize_quotes() for each string value
- 2. Clean LinkedIn URL - Extract person slug, validate format, format as https://www.linkedin.com/in/{slug}
- 3. Clean Xing URL - Extract person slug, validate format, format as https://www.xing.com/people/{slug}
- 4. Remove Empty Fields - Delete all fields with None, empty string, or blank values
Example:
{"contact_linkedin": "https://linkedin.com/in/john-doe?param=123", "contact_email_valid": "john@example.com", "contact_xing": ""}{"contact_linkedin": "https://www.linkedin.com/in/john-doe", "contact_email_valid": "john@example.com"}1.2 Company Field Cleaning (company_field_cleaning.py)
Purpose:
Normalizes and validates company field data including domains and social media URLs.
clean_company_fields(company_data: dict) → dict
Purpose:
Validates and normalizes all company fields including domains and social media URLs.
Logic Flow:
- 1. Normalize and Strip All String Fields
- 2. Clean LinkedIn URL - Extract company slug, format as https://www.linkedin.com/company/{slug}
- 3. Clean Domain - Remove prefixes (https://, www.), store only base domain
- 4. Clean Xing URL - Format as https://www.xing.com/pages/{slug}
- 5. Clean Instagram URL - Format as https://www.instagram.com/{slug}
- 6. Clean Facebook URL - Format as https://www.facebook.com/{slug}
- 7. Clean Pinterest URL - Format as https://de.pinterest.com/{slug}
- 8. Clean TikTok URL - Format as https://www.tiktok.com/{slug}
- 9. Clean YouTube URL - Format as https://www.youtube.com/{slug}
- 10. Clean Twitter URL - Format as https://x.com/{slug}
- 11. Remove Empty Fields
Note: Try/except blocks around URL parsing for each platform. Malformed URLs are set to None.
Field Cleaning Detailed Documentation: For comprehensive field cleaning algorithms and platform-specific rules, see the Field Cleaning section earlier in this documentation.
2. Processing CSV Functions
2.1 Fetch CSV Fields (fetch_csv_fields.py)
Purpose:
Fetches and parses specific rows from CSV files stored in Supabase Storage with header mapping.
Connection Management:
- •
_shared_pool: Global variable set by main script - •
set_connection_pool(pool): Sets the shared pool - •
get_pg_connection(): Gets connection from pool - •
release_pg_connection(conn): Returns connection to pool
fetch_csv_batch(file_name, start_row, end_row, csv_header_rows)
Logic Flow:
- 1. Construct File Path - Build path: csv_uploads/{file_name}.csv
- 2. Check File Exists - Query storage.objects table
- 3. Download CSV File - HTTP GET from Supabase Storage
- 4. Parse CSV Content - Create CSV reader from StringIO
- 5. Validate Row Indices - Check bounds and adjust end_row
- 6. Extract Batch Rows - Map column values to header names if mapping provided
- 7. Return Batch Result - Include file_name, total_rows, range, and row data
Database Operations:
- • Table: storage.objects
- • Query: SELECT to check file existence
- • Connection: Uses shared pool, released after query
Error Scenarios:
- • Database Error: Returns error message, releases connection
- • File Not Found: Returns error "File not found in storage"
- • HTTP Error: Returns error with status code
- • Invalid Row Index: Returns error with valid range
2.2 Fetch Upload Job (fetch_upload_job.py)
Purpose:
Retrieves upload job metadata from ev_upload_jobs table with fallback to HTTP API.
Heroku Compatibility: Supports temporary connections before pool initialization. Falls back to HTTP API if PostgreSQL fails.
fetch_upload_job_by_id(job_id: str)
Fetches upload job data using PostgreSQL first, HTTP API as fallback.
Logic Flow:
- 1. Try PostgreSQL fetch via _fetch_via_postgres()
- 2. If fails, try HTTP fetch via _fetch_via_http()
- 3. Return result or None if both methods fail
_fetch_via_postgres(job_id: str)
Logic Flow:
- 1. Get connection from pool with RealDictCursor
- 2. Query ev_upload_jobs by ID (id, workspace_id, user_id, data_type, job_type, mapped_csv_fields, csv_header_rows, etc.)
- 3. Filter csv_header_rows to keep only the row at csv_header_row_position
- 4. Check CSV file existence via _check_csv_exists()
- 5. Reorder result dictionary (core fields first, metadata last)
- 6. Release connection and return ordered result
Database Operations:
Table: ev_upload_jobs, storage.objects | Query: SELECT by ID | Connection: Uses shared pool
2.3 Update Progress (update_progress.py)
Purpose:
Updates job status and progress data in ev_upload_jobs table.
update_job_status(job_id, job_status)
Updates the job_status field in ev_upload_jobs.
Query:
UPDATE ev_upload_jobs SET job_status = %s WHERE id = %sReturn: Tuple (success: bool, error_message: str or None)
update_job_progress(job_id, progress_data)
Updates the progress_data JSONB field in ev_upload_jobs.
Query:
UPDATE ev_upload_jobs SET progress_data = %s WHERE id = %sReturn: Tuple (success: bool, error_message: str or None)
2.4 Error CSV Push (error_csv_push.py)
Purpose:
Formats and uploads error CSV files to Supabase Storage csv_wrong folder.
push_error_csv(all_csv_errors, csv_file_name)
Main function to format, check, delete old, and upload error CSV.
Logic Flow:
- 1. Initialize Result Dictionary
- 2. Format Errors to CSV - Call format_errors_to_csv() with row_index, errors as first columns
- 3. Check if File Exists - Query storage.objects
- 4. Delete Existing File - If found, delete via _delete_error_csv()
- 5. Upload New Error CSV - POST to Supabase Storage
- 6. Return Result - Complete status of all operations
Return Value:
{error_file_found: bool, error_file_deleted: str|None, file_created: bool, file_name: str, message: str}3. Processing Lookup Functions
3.1 Contact Lookup (contact_lookup.py)
Purpose:
Searches for existing contacts/leads in v_contact_lookup view with two-phase lookup strategy.
lookup_contact(contact_data, company_id, workspace_id)
Two-phase lookup for contacts - first with company context, then without.
Phase 1: Lead Lookup (with company context)
- • Email - Check all email types (valid, catch_all, invalid, unsure) against email_array
- • LinkedIn + company_id - Check linkedin_array with lead_companies_main_id match
- • Xing + company_id - Check xing_array with lead_companies_main_id match
- • Name + company_id - Check first_name/last_name variations with company match
Phase 2: Person Lookup (fallback without company)
- • LinkedIn alone - Check linkedin_array without company constraint
- • Xing alone - Check xing_array without company constraint
Match Strategy: Phase 1 prioritizes lead matches (person at specific company). Phase 2 falls back to person matches (any company). Email has highest priority across all companies.
Return Value:
Tuple (lead_id, person_id, lead_workspace_id, people_workspace_id, lookup_result, error_message)• Phase 1 success: Returns lead_id and person_id
• Phase 2 success: Returns person_id only (lead_id is None)
• Not found: Returns all None (not an error)
• Error: Returns all None with error message
Database Operations:
- • View: v_contact_lookup (main search)
- • Tables: db_leads_workspace, db_people_workspace (workspace lookups)
- • Queries: SELECT with parameterized OR conditions
- • Connection: Uses shared pool, always released
3.2 Company Lookup (company_lookup.py)
Purpose:
Searches for existing companies in v_company_lookup view.
lookup_company(company_name, company_domain, company_linkedin, workspace_id)
Single-phase lookup for companies using ANY identifier match.
Lookup Conditions (OR logic):
- • Name: company_name_array match
- • Domain: company_domain_array match
- • LinkedIn: company_linkedin_array match
Match Strategy: Uses OR logic (any identifier matches). Returns first match (LIMIT 1). No ranking or priority between identifiers.
Return Value:
Tuple (company_main_id, company_workspace_id, error_message)• Found: Returns company_main_id (and company_workspace_id if workspace_id provided)
• Not found: Returns (None, None, None)
• Error: Returns (None, None, error_message)
Database Operations:
- • View: v_company_lookup (main search)
- • Table: db_companies_workspace (workspace lookup)
- • Queries: SELECT with LIMIT 1 (returns first match)
- • Connection: Uses shared pool, always released
4. Processing Push Functions
4.1 People Push (people_push.py)
Purpose:
Creates new person records in db_people and identifiers in db_people_identifiers.
create_person_record(contact_data, http_client)
Logic Flow:
- 1. Map Fields for Insertion - Extract non-null values for names, demographics, birth, location, LinkedIn, career, etc.
- 2. Validate Data - Return error if no fields to insert
- 3. Build INSERT Query - Parameterized query with RETURNING *
- 4. Execute INSERT - Insert into db_people, commit, extract person_id
- 5. Track Identifiers - Collect contact_linkedin → 'linkedin', contact_xing → 'xing'
- 6. Batch Insert Identifiers - Use executemany for db_people_identifiers
- 7. Return - (person_id, None) on success
Database Operations:
- • Tables: db_people (main), db_people_identifiers (identifiers)
- • Queries: INSERT with RETURNING, Batch INSERT for identifiers
- • Transaction: Committed after each operation
4.2 Lead Push (lead_push.py)
Purpose:
Creates new lead records in db_leads connecting person to company, with email identifiers.
create_lead_record(contact_data, person_id, company_main_id, http_client)
Logic Flow:
- 1. Map Regular Fields - Position, classification, timeline, summary
- 2. Map Array Fields - lead_sources (convert string to array if needed)
- 3. Build INSERT Query - Include people_id and companies_main_id references
- 4. Execute INSERT - Insert into db_leads, commit, extract lead_id
- 5. Track Email Identifiers - For each email field, determine status (valid, catch_all, invalid, unsure)
- 6. Batch Insert Emails - Use executemany for db_leads_identifiers with status
- 7. Return - (lead_id, None) on success
Database Operations:
- • Tables: db_leads (main), db_leads_identifiers (emails)
- • Queries: INSERT with RETURNING, Batch INSERT for emails
- • Transaction: Committed after each operation
4.3 People Workspace Push (people_workspace_push.py)
Purpose:
Creates workspace connections for people in db_people_workspace.
create_people_workspace_connection(people_id, workspace_id, http_client)
Creates people-workspace connection record.
Query:
INSERT INTO db_people_workspace (people_id, workspace_id) VALUES (%s, %s) RETURNING idReturn: Tuple (people_workspace_id, error_message)
4.4 Lead Workspace Push (lead_workspace_push.py)
Purpose:
Creates workspace connections for leads in db_leads_workspace with optional qualification.
create_lead_workspace_connection(lead_id, workspace_id, lead_qualified_ws, http_client)
Creates lead-workspace connection with optional qualification status.
Logic:
- • Always include: leads_id, workspace_id
- • If lead_qualified_ws provided: Add to fields and params
- • Build dynamic INSERT query with RETURNING id
Return: Tuple (lead_workspace_id, error_message)
4.5 Company Workspace Push (company_workspace_push.py)
Purpose:
Creates workspace connections for companies in db_companies_workspace.
push_company_workspace(company_data, workspace_id, company_main_id)
Creates company-workspace connection with optional qualification and tags.
Fields:
- • Always: workspaces_id, companies_main_id
- • Optional: company_qualified (boolean)
- • Optional: company_custom_tags_ws (text[] - converted from comma-separated string)
Return: Tuple (company_workspace_id, error_message)
4.6 Company Push (company_push.py)
Purpose:
Creates new company records in db_companies_main and identifiers in db_companies_dt_identifiers.
push_company(company_data)
Logic Flow:
- 1. Define Identifier Mappings - name, domain, linkedin, email, phone, social media
- 2. Map Main Table Fields - Company details, address, registration, metrics, media
- 3. Map Array Fields - company_tags (text[]), company_sources (data_sources[])
- 4. Build INSERT Query - If main fields exist: INSERT with casts (%s::text[], %s::data_sources[]). If only identifiers: INSERT DEFAULT VALUES
- 5. Execute Main Insert - Insert into db_companies_main, commit, extract company_id
- 6. Collect and Insert Identifiers - Batch INSERT for all identifier types
- 7. Return - (company_id, None) on success
Identifier Types Supported:
- • name
- • domain
- • phone
- • tiktok, youtube, twitter
Database Operations:
- • Tables: db_companies_main (main), db_companies_dt_identifiers (identifiers)
- • Queries: INSERT with RETURNING (or DEFAULT VALUES), Batch INSERT for identifiers
- • Transaction: Committed after each operation
5. Processing Update Functions
Non-Destructive Update Strategy: All update functions use additive operations. Regular fields are updated only if new value provided. Array fields append new items using array_cat() or || operator. Identifiers are only inserted (never deleted). Existing data is always preserved.
5.1 People Update (people_update.py)
Purpose:
Updates existing person records and adds new identifiers (non-destructive).
update_person_identifiers(contact_data, person_id)
Logic Flow:
- 1. Map Fields for Update - Same field mappings as people_push
- 2. Build UPDATE Query - If fields exist: UPDATE db_people SET field1 = %s, ... WHERE id = %s
- 3. Fetch Existing Identifiers - SELECT from db_people_identifiers
- 4. Identify New Identifiers - Check if (identifier, type) NOT in existing set
- 5. Batch Insert New Identifiers - Use executemany
- 6. Return - (fields_updated: bool, identifiers_added: int, error_message)
Non-Destructive: Keeps current field values if new value not provided. Only adds new identifiers, never deletes existing ones.
5.2 Lead Update (lead_update.py)
Purpose:
Updates existing lead records and adds new email identifiers (non-destructive).
update_lead_identifiers(contact_data, lead_id)
Logic Flow:
- 1. Fetch Existing Record - Get lead_sources array
- 2. Map Regular Fields - Same as lead_push
- 3. Handle Array Field - Find new items not in existing lead_sources, use array_cat() to append
- 4. Execute UPDATE - Update fields and append to arrays
- 5. Fetch Existing Email Identifiers - Build dict of (identifier, type) → status
- 6. Process Email Identifiers - New emails → inserts list, Changed status → updates list
- 7. Batch Operations - INSERT new emails, UPDATE changed statuses
- 8. Return - (fields_updated: bool, identifiers_added: int, error_message)
Non-Destructive: Regular fields keep current values if new not provided. Arrays append new items. Email identifiers are added or status updated (never deleted).
5.3 Lead Workspace Update (lead_workspace_update.py)
Purpose:
Updates lead workspace connection fields in db_leads_workspace.
update_lead_workspace(lead_workspace_id, lead_qualified_ws)
Updates qualification status for lead workspace connection.
Query:
UPDATE db_leads_workspace SET lead_qualified_ws = %s WHERE id = %sReturn: Tuple (workspace_updated: bool, error_message)
5.4 People Workspace Update (people_workspace_update.py)
Purpose:
Updates people workspace connection fields (currently no updateable fields).
update_people_workspace(contact_data, people_workspace_id)
Placeholder for future workspace field updates. Currently returns success immediately.
Note: db_people_workspace currently only contains (id, people_id, workspace_id). No updateable fields exist. Function exists for consistency and future extensibility.
Return: Tuple (True, None)
5.5 Company Update (company_update.py)
Purpose:
Updates existing company records and adds new identifiers (non-destructive).
update_company_identifiers(company_data, company_main_id)
Logic Flow:
- 1. Fetch Existing Record - Get company_tags and company_sources arrays
- 2. Map Regular Fields - Same as company_push
- 3. Handle Array Fields - Find new items, use || operator with explicit cast (company_tags || %s::text[], company_sources || %s::data_sources[])
- 4. Execute UPDATE - Update fields and append to arrays
- 5. Fetch Existing Identifiers - Build set of (identifier, type) tuples
- 6. Identify New Identifiers - Check all identifier mappings against existing set
- 7. Batch Insert New Identifiers - Use executemany
- 8. Return - (fields_updated: bool, identifiers_added: int, error_message)
Non-Destructive: Regular fields keep current values. Arrays append new items using || operator with cast. Identifiers are only added, never deleted.
5.6 Company Workspace Update (company_workspace_update.py)
Purpose:
Updates company workspace connection fields (qualification and tags).
update_company_workspace(company_data, company_workspace_id)
Updates workspace qualification and appends custom tags (non-destructive).
Logic:
- 1. Fetch existing company_custom_tags_ws array
- 2. If company_qualified provided: Add to SET clause
- 3. If company_custom_tags_ws provided: Find new tags, append using || operator with ::text[] cast
- 4. Execute UPDATE if fields exist
- 5. Return (workspace_updated: bool, error_message)
Non-Destructive: company_qualified updates to new value. company_custom_tags_ws appends new tags without removing existing ones.
Summary of Patterns and Best Practices
Connection Management Pattern
All functions follow this pattern:
conn = None
cursor = None
try:
conn = get_pg_connection()
cursor = conn.cursor(cursor_factory=RealDictCursor)
# ... database operations ...
conn.commit()
cursor.close()
release_pg_connection(conn)
return (result, None)
except psycopg2.Error as e:
if conn:
conn.rollback()
release_pg_connection(conn)
return (None, error_message)
finally:
# Cleanup in except blocks, not finallyNon-Destructive Update Strategy
All update functions use additive operations:
- • Regular fields: Update only if new value provided
- • Array fields: Append new items using array_cat() or || operator
- • Identifiers: Only insert new ones, never delete existing
- • Example: If existing tags = ["tag1"], input tags = ["tag2"], result = ["tag1", "tag2"]
Error Handling Strategy
Cleaning Functions:
- • Try/except around URL parsing
- • Return None for invalid data
- • Never raise exceptions
Lookup Functions:
- • Return None for not found (not an error)
- • Return error message for database failures
- • Always release connections
Push/Update Functions:
- • Rollback transactions on error
- • Return tuple (result, error_message)
- • Always release connections
- • Cascade returns (partial success allowed)
Parameterized Queries
All database queries use parameterized placeholders:
- • Prevents SQL injection
- • Handles special characters
- • Example:
WHERE id = %s, params:(id_value,)
Batch Operations
Use executemany for multiple inserts/updates:
- • More efficient than multiple execute() calls
- • Used for identifiers (people, leads, companies)
- • Used for email identifiers with status
cursor.executemany(query, [(data1,), (data2,), ...])