Standardisation is, as the word itself indicates, the act of establishing a standard about something. The standards that are applied can be very diverse and applied to an infinite number of different data. At Deyde DataCentric we talk about standardization, when name and/or address data are processed by separating them into their different components and verifying their validation.
When a postal address is separated, each of its components is divided or campified: type and name of street, number of portal, address supplements (floor, door, letter, block and staircase), postal code and town. In the case of names, the information is separated into: first name, link to the first surname, first surname, link to the second surname and second surname.
Standardizing is not only dividing an address or name into the corresponding fields, but also applying rules and standards to correct the words always in the same way, regardless of how they are written in their origin.
For example, we converted the different ways in which the term “Avenue” can be written, such as AVD; Avd; AV; AVA; Avinida; Abenida,… in a common one: “AVDA”. The same applies to the abbreviations in surnames and first names, for example: “MTNEZ” which once treated will be standardised as “MARTINEZ” and identified as a surname.
The data corresponding to the name are not only standardized but also enriched with the gender according to the name of origin and the address data are contrasted with official sources to verify whether or not they are valid, incorporating indicators that allow us to know the degree of validation of the postal address and its reliability.
At Deyde DataCentric we also standardize phones under a standard, in this case non-numeric characters are eliminated, the prefix is assigned and the first digits of the number are validated. The same happens with the DNI, which we standardize by eliminating the usual characters such as full stops, hyphens and commas, to maintain a common format. In the case of e-mails we can talk about the same situation, we eliminate unusual characters and standardize the most common email addresses such as Gmail or Hotmail, when they are misspelled.