Validating, normalizing, and formatting phone numbers in Rails
Often times in applications, the phone column is stored as a string (or an integer 😱) in the database and is a mess at the whim of the users. There’s occasionally a presence validation thrown in there for good measure or maybe even a format check for 10 digits, but these can fall short. The convention of hyphens and parenthesis are for human readability, that is, formatted values. But at the database level, we want valid and uniform data where practical.
Is it valid?
If a phone number is invalid, then there is no need to normalize… it’s useless. But how do we determine validity? Phone numbers are tricky to get right, especially when you realize support for international. So a library or API proves essential for covering most cases. Maybe this is overkill, but if usable phone numbers are necessary for your application, then they’re (maybe?) worth getting right.
One name rises to the top when digging into phone number validation: libphonenumber by Google. This library has many derivates in many languages but is referenced and ported freely. Even Twilio, the developer platform for communications, uses it for their phone Lookup API:
Twilio will use libphonenumber, Google’s open source phone number handling library, to properly format possible phone numbers for a given region, using length and prefix.
Yeah, that’s a problem. Sadly we can’t use the library directly, but there are several ports available. libphonenumber directly references the telephone_number gem, but I typically reach for another called Phonelib. It has a lightweight interface that works great for utility purposes. Here’s what it looks like:
Phonelib.valid?("123456789") #=> false
And it includes a phone validator for ActiveRecord models:
validates :attribute, phone: true
Boom, easy, done.
The phone number is valid… now what?
Great! Now we need to store it. But have you considered the variety of phone number formats: dot vs space vs hyphen separated, parens vs no-parens, plus all the international variants? We could use a JS mask and enforce it with a format regex, but not all numbers should be formatted the same way.
Enter stage left: “normalization,” the process of making something “normal”…
normal: conforming to a type, standard, or regular pattern
– Merriam-Webster Dictionary
…and based on that definition, it sounds a lot like something we’d want in our applications.
But how do we define normal?
Luckily there’s a pretty good answer to this: E.164. This is a spec for formatting phone numbers that ensures each device has a “globally unique number.” That sounds like exactly the kind of thing we should be storing. In fact, this is the recommendation by Twilio:
If you plan to store the phone number after validating, we recommend storing the full E.164 formatted number[.]
The E.164 format is most easily identified by a “+” prefixing a number. At a high level, the structure is “+”, then country code, then subscriber number. Here’s an example: “+14155552671”.
But how do we convert something like “(415) 555-2671” into its proper E.164 format for storage? Again, Phonelib can help:
Phonelib.parse("(415) 555-2671", "US").full_e164 #=> "+14155552671"
But isn’t a phone number just a number?
In Google’s Falsehoods Programmers Believe About Phone Numbers (definitely worth a read through), they specifically address this in #25 Phone numbers are numbers:
Never try to store phone numbers as an int or any other kind of numeric data type. You can’t do arithmetic on them[.]
So there you have it. You can’t do math on phone numbers, so a string is probably a better candidate.
But now the phone number is not human-friendly 🙄
This is true. Normalized data is great for computers, but not so much for the users. We should now circle back around and offer a way to display human-friendly phone numbers. Again, a technically difficult problem, as conventions for formatted phone numbers vary by country.
According to Google’s style guide for phone numbers, there are two main phone number formats to handle: North American (NANP) and international. So for USA, Canada, and other NANP countries (indicated by country code “1”), we use the national format: “(415) 555-2671”, but elsewhere the international format: “+44 20 7183 8750”. Fortunately, these conventions are both supported by Phonelib:
Phonelib.parse("+14155552671").full_national # US #=> "(415) 555-2671" Phonelib.parse("+442071838750").full_international # UK #=> "+44 20 7183 8750"
And now for adding a validated, normalized, formatted phone to an ActiveRecord model in Rails, phew 😅
Add the Phonelib gem
# Gemfile gem "phonelib"
Add some global config to avoid variances in usage
# config/initializers/phonelib.rb Phonelib.default_country = "US" Phonelib.extension_separate_symbols = ["x", ";"]
default_countryis used when parsing a phone number without a country prefix.
extension_separate_symbolsdefines which characters indicate a phone number extension. The default is “;” but we also want to support “x”, “ex”, and “ext”… all covered by specifying “x”.
Validate the phone number to prevent bad data
The Phonelib library adds a
PhoneValidator class, activated by
phone: true. Depending on whether the field is required or not, you can also add an
# app/models/business.rb class Business < ApplicationRecord validates :phone, phone: true, allow_blank: true end
phone: truerefers to the
PhoneValidator, not the
phoneattribute on the model.
Normalize the phone number before persisting
We want to normalize the record just before persisting to the database. Callbacks might be out of vogue, but they prove the point here in plain speak, “before save normalize phone.”
class Business < ApplicationRecord before_save :normalize_phone # ... private def normalize_phone self.phone = Phonelib.parse(phone).full_e164.presence end end
#presencecall at the end is to handle an empty phone number. The
#full_e164method returns an empty string, but it would be preferable to store that in the database as
Format the phone number for display
#phone method returns the E.164 representation of the phone number, so let’s add a second
#formatted_phone method as a display-friendly variation.
class Business < ApplicationRecord # ... def formatted_phone parsed_phone = Phonelib.parse(phone) return phone if parsed_phone.invalid? formatted = if parsed_phone.country_code == "1" # NANP parsed_phone.full_national # (415) 555-2671;123 else parsed_phone.full_international # +44 20 7183 8750 end formatted.gsub!(";", " x") # (415) 555-2671 x123 formatted end # ... end
Firstly, we have to consider that the
phone attribute might not be valid. For instance, during form validation the user could enter a garbage value. So instead of displaying nothing on re-render, we actually want to present the user with exactly what they entered along with an error message. That’s what the guard is determining: if invalid, then show
phone unmodified, else proceed with formatting.
For formatting, we want NANP numbers to use the “national” phone number format, but all others the “international” format. We can make this distinction by looking for a “1” country code. Now let’s zoom in to what’s going on in the national formatting case:
parsed_phone = Phonelib.parse("+14155552671;123") #=> #<Phonelib::Phone> parsed_phone.full_national #=> "(415) 555-2671;123" parsed_phone.full_national.gsub(";", " x") #=> "(415) 555-2671 x123"
We first parse the phone number to get back a
Phonelib::Phone instance. Then,
#full_national returns the national formatted phone number, but with the extension separated by a semicolon… not very pretty. So we use
#gsub to swap that out for something that looks a bit nicer. NOTE: the extension separator needs to match up with the list specified in the initializer.
class Business < ApplicationRecord before_save :normalize_phone validates :phone, phone: true, allow_blank: true def formatted_phone parsed_phone = Phonelib.parse(phone) return phone if parsed_phone.invalid? formatted = if parsed_phone.country_code == "1" parsed_phone.full_national else parsed_phone.full_international end formatted.gsub!(";", " x") formatted end private def normalize_phone self.phone = Phonelib.parse(phone).full_e164.presence end end
Leaning into libphonenumber is an industry-recognized good option, but it doesn’t ensure 100% confidence. Still, considering phone numbers in separate terms of validation, normalization, and formatting, is a valuable exercise that can (hopefully) lead to better phone number handling as needed.