Availability in Distributed Systems

In a typical distributed software environment, besides latency, a key metric that correlates well with the health of the business is what is known as “availability”. 

Availability is usually measured by the percentage of time the software is not down (i.e.: available). Availability is vital to modern software organizations. ‘High availability’, or HA for short, is a term commonly used by software professionals. It refers to a software system that enjoys a very low down time.

black and silver round analog clock
Photo by cottonbro studio on Pexels.com

Let’s go back to our hypothetical social network from the previous article, in which you are a senior … Read more

Distributed Software Systems at scale

One of the most potent skills a senior engineer can possess in modern software engineering orgs is the ability to navigate distributed software systems at a global scale. Designing, constructing, and maintaining such systems can be an immensely challenging yet profoundly satisfying intellectual exercise. For some of the top players in the industry, like Google or Meta/Facebook, the responsibilities of lead engineers are akin to the responsibilities of commanders overseeing a global battle field. The business relies on you and your peers to operate massive global systems with a myriad of wild variables. 

Distributed software systems is a well documented … Read more