Service Magic

MSP SLA - Guide to Managed Service Agreements

Written by Matt Linn | Sep 19, 2024 4:00:00 PM

Service Level Agreements are a kind of business contract between managed service providers and their clients. These documents lay out the scope of services and set expectations for how the relationship will work. SLAs are considered "best practice" in the MSP industry and help to prevent a plethora of problems ranging from client disappointment and pestering to legal disputes.

An effective Service Level Agreement (SLA) document will be tailor-made to fit the specific offerings of the MSP and can't necessarily be drawn up from a generic template. However, there are certain commonalities across MSP SLAs that will serve as a guide as you create your own.

What Is an MSP Service Level Agreement?

A Service Level Agreement is a legally binding contract between a managed service provider and a client. It clearly outlines all of the terms of their business relationship, establishes the obligations of both parties, and functions as a guarantee for the client as far as the level of service they can expect.

SLAs should be drawn up and signed at the beginning of every new client contract and then be updated every 18 to 24 months or when there are changes to the agreement for other reasons.

Why Are Service Level Agreements Important?

A managed services SLA document is critical for several reasons:

  1. Establishes clear expectations. An SLA facilitates a clear understanding between both parties from the beginning about what is and isn't included, the client's and MSP's obligations, and the level of service to expect. This prevents disappointment, misunderstandings, resentment, and scope creep.
  2. Clarifies what you are (and aren't) responsible for. Clients may see your MSP as responsible for everything tech-related from the moment you begin your partnership. An SLA clarifies what is and isn't your responsibility to avoid misplaced blame.
  3. Specifies the terms of remediation. The SLA explains what will be done if the service provider fails to meet one or more service quality benchmarks. Outlining a specific form of redress such as a monthly service credit (which is a common form of redress) prevents problems with renegotiations, haggling, and potential legal disputes for a perceived breach of contract.
  4. Helps with internal planning and benchmarking. Clear service expectations in your SLAs—which should be consistent across the board—help you plan for and measure the same key performance indicators internally.
  5. Supports a perception of professionalism and trust. Creating a well thought-out managed services agreement upfront shows that you are professional and trustworthy. Small MSPs (which often don't bother with SLAs) should consider implementing these documents to help them stand out in a crowded market.

An SLA essentially protects the interests of the MSP and the client. This sets you up for a positive and hopefully smooth long-term business relationship.

Things to Include in an SLA

There are several basic items that every Service Level Agreement should have. You will fill out each section based on your offerings and the specific objectives of each MSP-client partnership.

Scope of Services

The scope of services section outlines exactly which services are included in the monthly price. This is critical because it prevents clients from asking for one-time or ongoing services that are not included in the agreement (for free).

The scope of services should include things like:

  • Hardware
  • Software
  • Servers
  • Monitoring services
  • Helpdesk services (and hours)
  • Others

Tip: Use generic rather than branded terms when describing what is included. This allows you to change vendors without having to rewrite your SLAs.

Compliance Metrics

These are the service quality benchmarks that clients can expect from the MSP. Your MSP's KPIs must be measurable and not vague or open to interpretation. They must also be realistic—it's better to be conservative and overperform than to make impressive-sounding claims that you won't be able to fulfill.

Service Availability

Service availability describes the percentage of uptime clients can expect from your network, VoIP phone system, and cloud services. Take into account scheduled maintenance and service outages due to software or hardware failures, network failures, human errors, and unforeseen circumstances such as power outages or natural disasters.

This standard should also define what constitutes "downtime" in terms of the number of endpoints affected or an error rate threshold. This is typically expressed as the RTO (recovery time objective) and RPO (recovery point objective):

  • Recovery Time Objective: The RTO is the maximum allowable time that a computer, application, network, or system can be down after a disaster or failure.
  • Recovery Point Objective: The RPO is the maximum allowable amount of data (measured in time) that can be lost as the result of a disaster or failure.

Setting standards for these metrics is essential, as vague definitions (or a lack of definitions) could lead to a client perceiving that you haven't met this standard when a single endpoint is temporarily down.

Response Times

This is the average time it takes your team to respond to critical and non-critical issues. Define clearly what constitutes a critical issue (such as a company-wide network outage) and a non-critical issue (such as configuring a new employee's password) and the response times for each. You could also stratify response time based on the client's subscription level with greater priority given to top-tier clients.

Tip: Implementing Thread is a highly-effective way of improving your response and resolution times because the software automatically triages each issue, categorizes it, and allocates the ticket ("thread") to a technician according to urgency, availability, and expertise. This provides the client with an immediate response and a faster resolution.

Resolution Times

Resolution times refer to how quickly critical and non-critical issues will be resolved. It is better to err on the conservative side here and be very clear to protect yourself in the case of factors that are outside of your control (such as a natural disaster). You also need to specify the resolution times for different kinds of ticket or service issues.

Note: Some MSPs use "mean time to resolution" or MTTR as their resolution time standard.

Modified Resolution Times

You could provide modified response times for issues that occur outside of business hours when the client's premises are locked and you need physical access to resolve the issue. This could be presented under a subheading covering "service limitations."

First-Time Resolution (FTR)

First-time resolution is a measurement of the number of tickets that are resolved with only one interaction. A high first-time resolution rate is preferable for a positive customer experience.

Time to Notification

This is the average time to identify an issue and notify the client. Shorter times to notification are better to minimize the impact of the issue on the client’s operations.

MOS Scores

Mean Opinion Score (MOS) measures the perceived quality of a VoIP call on a scale from 0.0 to 5.0. Within this range, anything below 3.5 is considered unacceptable and 4.3 is considered an excellent goal.

Throughput

Throughput measures how much data a service can process in a given timeframe. This might be transactions per second, requests per second, or data transfer rates.

Error Rate

This refers to the percentage of the time that a request results in an error when using a service. A high error rate might mean capacity issues or a bug in the system.

Capacity Violations

Capacity refers to the available resources, such as bandwidth or storage space. MSPs that provide cloud services and storage solutions need to monitor capacity, as getting close to full capacity could slow down the client's system or prevent them from being able to save important files.

Insider insight: Each metric or quality standard is often colloquially referred to as an "SLA." For example: "response time SLA" or "resolution time SLA." The entire document is the "SLA proper."

The Client's Obligations

Clients must also be held responsible for their actions in the SLA, as this directly affects the MSP's ability to provide timely support and resolution. For example, the client's obligations might include the following:

  • Take out (a specific type and level of) cyber insurance.
  • Notify the MSP of issues promptly.
  • Provide accurate and complete information about each issue.
  • Notify the MSP if they make any changes to settings on their end.
  • Provide clear access to the premises if on-site services are part of the scope of services.
  • Refrain from installing unauthorized software.

Communication

The SLA should define the touchpoints and lines of communication between the MSP and the client, how to report issues, and escalation procedures if an issue isn't resolved on first contact.

Contingencies and Remediation

This section should outline what the MSP will do if for some reason they are not able to meet a compliance metric in a given month, with caps on any potential penalties. This is essential as it provides clear avenues for repair and enables the business relationship to continue after something goes wrong.

Security and Compliance

Define clearly to what extent you are responsible for security and compliance with laws surrounding things like confidential information and payment processing, and areas that are the client's responsibility. Also make it clear which backup services are and aren't included to avoid confusion and resentment in the case of data loss.

Legal

Outline the extent to which your MSP is liable. Include a "hold harmless clause" or similar indemnification policies to protect yourself in the case of third-party claims, a security breach, a service outage, or other disaster that happens for reasons that are beyond your control. The average cost of a data breach in 2024 is $4.88M, according to IBM’s 2024 report. You don’t want that kind of weight resting on your shoulders!

Changes to the Agreement

Create a policy for making changes to the agreement if either party wishes to do so. You could also include a clause that covers price rises due to inflation to avoid having to redo your contracts when your costs increase.

Termination Policy

This section should outline the conditions under which either party can request termination, how this request should be initiated (such as in writing, and the amount of warning required). It should also clarify the MSP's liability after termination, such as whether or not the MSP will hold backups of the client's data, previous client SLAs, and so on.

SLA Best Practices

Apply the following best practices to your SLAs:

  • Draw up SLAs that are specific to your MSP.
  • Cover every possible scenario.
  • Ensure every standard is measurable and don't leave any room for interpretation. Use SMART goals (Specific, Measurable, Attainable, Realistic, Time-bound) in your SLAs.
  • Make sure your internal KPIs match those expressed in your client-facing SLAs.
  • Update your SLAs whenever there is a change to the service, or every 18 to 24 months. Keep copies of previous SLAs and make sure these are available to the client.
  • Make your SLAs transferable in case of an MSP merger or acquisition. This provides peace of mind for your clients.
  • Monitor the metrics in your SLAs accurately and consistently for efficient reporting. Thread helps with this by completing time entries automatically.
  • Work with the most effective software solutions to help you meet your service quality benchmarks consistently.

Clarify Things from the Start with an SLA

Service Level Agreements take time to put together but are essential for starting off each MSP-client relationship on the right foot. Make sure your SLAs cover every possible scenario and continue to refine and improve them over time.

The next priority for your MSP will be to fulfill the terms of your SLAs to the letter (and preferably go above and beyond!). Integrating software solutions like Thread helps you deliver "service magic" every time and cut down your response and resolution times for happy clients and more money in your pocket at the end of the day.