The Dataprise Blog

An MSPs Deep Dive Into Email PT III: Troubleshooting

Nov 04, 2020 BY Ben Birnstein

An MSPs Deep Dive Into Email PT III: Troubleshooting

Now that we have examined the principles of mail flow and role of the cloud in email in previous installments of this series, I wanted to conclude by giving a crash course on how to troubleshoot and resolve mail flow issues. Keep in mind that every case is different and reading this article will not automatically make you an expert in mail flow issues, but it will provide a solid overview that even a layman can understand.

Determine the Scope

To solve any mail flow issue, the first step is identifying the root cause – whether the issue is user behavior or a systems error. The easiest way to determine this is to find out if multiple users are having similar issues; if not, it is generally safe to assume that the issue is not caused by the overarching mail system, but by the user’s individual device or behavior.

For example, say a user is having difficulties receiving mail. If the user is only missing mail from a single address, the problem is most likely related to that specific address. Alternatively, if a user is missing all incoming mail, but no other users have lost any, it is most likely an issue with that user’s mailbox.

Determine the Symptoms

Once the issue scope has been determined, it’s important to examine the symptoms of the problem. Just like a doctor taking care of a sick patient, the symptoms of your IT issue will provide valuable information which you can trace back to the root cause.

The best place to start building an understanding of the symptoms of your IT issue is to interview the end-user to receive a firsthand account of the problem. Many users’ first instinct is to focus on how the problem affects them (e.g., “the email a partner sent me did not show up in my inbox”). However, these types of descriptions typically do not contain enough information to reveal the symptoms of the issue. To get this information, make sure to ask probing questions about the user’s specific experience, and any specific messages or windows which may have appeared on the system when the issue occurred.

Once you have gained as much information as possible from the user, access the device, system, or user account in play to determine additional symptoms. To extend the doctor comparison, this is like taking an x-ray or blood sample after hearing the patient’s account of their injury or illness.

Follow the Mail Flow

Once the scope and symptoms of a mail flow issue have been determined, the easiest way to discover the underlying problem is to follow the mail flow. Simply put, following the mail flow means tracing a message’s path based on its sender, recipient, subject, or other criteria. This enables you to determine at which phase in the mail flow process the message was lost.

To illustrate the benefits of following the mail flow, here are a few examples:

  • If message traces show that the message was received at or around the time it was sent, it can then be determined whether the message was blocked as a result of a mail filter, spam filter, mail rule, or for some other reason.
  • If the message was sent to the sender’s mail filter, this can determine whether the message was queued and attempted for delivery.
  • If no message is received at the boundary of the recipient’s email system, the issue is likely caused by a problem within the sender’s environment.
  • If a message is verified as having been delivered to the recipient, it was likely filtered out by a message rule or spam filter.

A Two-Way Street 

Part of the challenge within mail flow troubleshooting is that it requires coordination between two separate organizations and IT departments (i.e., sender and recipient). A mail flow issue which originates with the sender organization may require the cooperation and help of the recipient organization’s IT department to resolve – or vice versa. This makes mail troubleshooting a complicated two-way street which requires transparency and collaboration between the sender and recipient.

For example, if the sender organization identifies that an email sent by an employee has not reached its destination, that organization’s IT department will generally begin investigating the issue. However, if the email message is verified as having left the sender organization, their IT department’s hands are tied. At that point, it is then up to the recipient organization to troubleshoot the problem.

Example Issue

The best way to learn new troubleshooting strategies can be in practice and by example. Below is a sample of the troubleshooting process for a specific mail related issue.

Scenario

An organization has recently decided to move from an on-premises Exchange server to a cloud-based Office 365 environment. After configuration and proper setup of domains, connectors, and rules, end-users are unable to receive messages from outside the organization.

Troubleshooting Actions

Since the environment is still in testing, there is no direct end-user to interview. Instead, the engineer jumps straight to following the mail flow and attempts to send messages to the newly configured system from an external recipient with otherwise functional outgoing mail flow, but the message is not delivered.

Symptoms:

After examining message traces on the recipient-side, the engineer discovers that no messages are being received and no non-delivery reports (NDR) or “bounce messages” are received by the external sender. This indicates that the mail being sent is reaching its intended destination or there is no destination server to send the bounce-back.

Solution

By isolating the symptoms of the issue and determining where in the mail flow the problem is occurring, the engineer can correctly determine and verify the root cause and facilitate a resolution. The mail exchanger (MX) record was not properly updated when the organization transitioned to a new environment. With the MX record still pointing to the decommissioned on-premises Exchange server, there was nothing on the receiving side to process the message. Having identified the root cause of the mail flow issue, the engineer can take action to ensure that the MX record is reconfigured to point to the new cloud environment.

Conclusion 

While mail flow issues may seem simple, especially to the end-user (remember “my email won’t send”?), there is often more complexity than expected. By gathering a firsthand account from the end-user and following the mail flow, engineers can determine the scope of the issue and the symptoms associated with it. Once the symptoms and scope are determined, the engineer will eventually be able to determine the root cause and resolve the issue. However, many issues require collaboration between both the sender and recipient organizations in order to resolve.

IT Fundamentals
Want the latest IT insights? SUBSCRIBE