Extracting sent and received mail statistics for any Exchange Online mailbox
Updated: Feb 7, 2022
I was recently asked to obtain some figures on the number of emails particular mailboxes were sending and receiving to help understand the volumes and look for opportunities to optimise workflows. This seems like it should be a straightforward ask but as always there are some complications. This post will explain my process and challenges, if you're just interested in the code you'll find the full script at the bottom of the article.
Checking what's available natively
Step one, don't reinvent the wheel. Checking my usual go-to reporting source of the Office 365 admin center, indeed the information is there (and more such as read / deleted mail statistics) but this is only available for licensed users so what about shared mailboxes or Microsoft 365 group mailboxes that are typically unlicensed?
Note: Be aware that by default your usage statistics may be anonymised to protect user privacy. This can be changed by following the instructions here however I do advise caution as this is user sensitive data.
Moving on to the Exchange Admin Center we can use a message trace to pull back messages sent or received by specified mailboxes but if we want to consolidate these into totals we'd need to make a search, export it to CSV then make the appropriate calculations, rinse and repeat. In essence it would be very laborious. Thankfully Microsoft provide a PowerShell cmdlet that allows access to the same function: Get-MessageTrace.
The Get-MessageTrace cmdlet will only return results for the past 10 days, if we wish to go back any further instead we need to use a historical search (Start-HistoricalSearch). While the data retrieved by both cmdlets is the same there are some differences between them, when running a message trace the results are available instantly on demand whereas a historical search queues the job, usually returning results at a later time. Equally there is a tenant limit of 250 historical searches per 24 hour period so be careful if submitting lots of requests!
For my purpose, 10 days was sufficient and from this I could easily build the historical view over time by executing the script periodically. It is also worth mentioning that the script in its current form only pulls complete days - I am not collecting statistics for the current day as this is still in progress and therefore the numbers will change.
Coding & challenges
The Get-MessageTrace cmdlet uses pagination. For those unfamiliar paging allows us to cater for vast amounts of data without overwhelming the server by splitting results into separate pages, it'll be a familiar concept for anyone that has used REST APIs before. For this cmdlet we can have a maximum of 1000 pages with a page size of 5000 (the total results returned in one execution). This means in total we could viably obtain 50,000 messages before needing to refine our other parameters to narrow the search.
In order to obtain the pages of data we must pass in a "Page" cmdlet parameter with the number of the page we are interested in, so if we wanted the second page containing messages 101 to a possible maximum of 200 we'd use the following command:
In reality depending on the amount of emails your mailboxes deal with, 5000 messages may far exceed your need but to keep things manageable I stuck with a page size of 1000 which is the default value if no page size is specified. This is stored within a variable in the script so can easily be amended if you wish to use larger or smaller pages.
While REST APIs give you a handy link to the next batch of records unless I'm overlooking something entirely I couldn't see any equivalent to automatically retrieve the next set of results so I instead resorted to wrapping the cmdlet in a custom function to handle the paging.
The function logic increments the page number until the number of result returned doesn't match the page size we are using. To illustrate, if the total message count is 2300 and we are using a page size of 1000. The first and second executions will return 1000 results each meaning we need to continue however the third execution will return 300 results which is less than our 1000 page size so we know we've reached the end.
Consolidating sent emails into the user context
From Exchange's point of view when sending an email that has 5 recipients spread across any of the addressee fields (To, CC, BCC), it is sending 5 distinct emails, one to each recipient. This is fine from a service side but if we are thinking about emails within an end user context only one email has been sent. I wanted to compile this user context sent figure so I set about inspecting the data returned by a trace to see how it could be consolidated down. I found that messages sent to multiple recipients share the same sent timestamp and equally the same subject.
Armed with this knowledge we can create a hashtable based on a combined element field of "Received" and "Subject" (yes for some reason sent messages store their timestamp in the "Received" field).
This gives the following output in the "userContextSent" variable - each row is an email sent within the user context and the value shows the number of messages sent by Exchange.
While the documentation for Get-MessageTrace states:
"Use the short date format that's defined in the Regional Options settings on the computer where you're running the command."
I have not found this to be the case, hence less of a challenge, more of a PSA than anything else, the date format output by the script is U.K (dd/mm/yyyy) whereas in the getMails function it is converted to U.S format (mm/dd/yyyy). You can change the output to suit your needs by altering the PSObject that is written to the results variable:
The full code
Note: if you have a large number of mailboxes to iterate over then I suggest swapping the array of mailboxes hardcoded in the script to instead import from a text file. This can be done by changing the definition of the "mailboxes" variable as follows and specifying each mailbox name on a new line in the referenced text file
Finally here is an example of the results CSV file. All in all achieving exactly what I set out to do.
Combining everything together from a coding standpoint gives the following script:
Overall a good exercise, there are some optimisations that could be made but it fully satisfies my original requirement. I hope someone finds it useful!