EDRM, the renowned community of e-discovery and legal professionals, brought together Doug Austin, the editor of eDiscovery Today, and Brian Coleman, Digital Forensics and Insider Threat Director at Pfizer, on December 5, 2023, to discuss electronically stored information (ESI) collection strategies for e-discovery in a webinar entitled “Streamline and Save: Smart ESI Collection Strategies for eDiscovery,” moderated by Mary Mack.
Austin and Coleman discussed data challenges in organizations today, e-discovery use cases, ESI collection challenges, the benefits of using a strategic approach to collection, and how to build and maintain such an approach.
The big ESI collection challenge starts with the volume and velocity of data. It’s hard to comprehend how much data we create worldwide annually. Austin unveiled estimates from Statista, which predicts we will generate 120 Zettabytes in 2023 and 181 in 2025 (one Zettabyte = 1,099,511,600,000 GBs). Coleman paused to consider all the metadata behind that data and advised listeners to be strategic in the data they collect because there’s not only more of it, but it’s become more complex, created by various applications, including collaboration software and social media apps with variable file types stored on-premises and in the cloud. This leads to “complex data collection that requires significant upfront work before you collect,” said Coleman.
Austin and Coleman discussed other challenges in e-discovery collection, including lost metadata, data security, over-collection, targeted collection, and potential errors in the custodian-directed collection. If an e-discovery process is insecure and not forensically sound, collections are prone to data leakage and loss. Attorneys and their clients risk exposing sensitive documents and waiving privilege, especially if there are multiple copies of data. Over-collecting increases costs for processing, review, and production, not to mention creating potential privacy issues. Yet, “targeted collection is tricky,” said Coleman.
Organizations and their outside counsel fear missing data that may not be available to them later in the process. To reduce risk, “put up guardrails,” said Coleman. Use the time frames of the litigation or investigation and ask custodians fundamental questions at the outset. Don’t rely solely on what they say in questionnaires but verify their answers by watching what they do in activity and log files.
Custodians may attempt to alter or hide documents during discovery. You must gain visibility into such potential activity, says Coleman, who engages in a triage process before collecting data. Consider the various custodian devices, who owns them, and the data stored on them. This data mapping exercise also determines what data is not on the device but stored in network shares and the cloud. Although primary, on-premises ESI sources may not pose problems, Coleman advised using vendor connectors for remote and cloud repositories whenever possible. Because if you create and maintain your custom code libraries for collection, you increase your costs and introduce errors into a collection.
Vendor-supported connectors can obtain rich metadata and activity logs, providing insight into how custodians use apps. Using metadata and activity logs, you can minimize data collection risks by looking at additional information to determine where data is. Utilizing that information, you can automate collection tasks into workflows to reduce human error.
As you onboard data sources, Coleman urges you to obtain and analyze activity logs. Pfizer injects 60 Terabytes of data and activity logs into a security information and event management (SIEM) system to identify who brings security risks to a matter and where specific data types reside.
Automating collection tasks can avoid custodian-driven collection errors, said Coleman. For example, custodians may miss or ignore a data source or fat-finger uniform resource indicators (URI) or locators (URLs).
Toward a Strategic Approach to E-Discovery Collection
Austin and Cole discussed strategic approaches to e-discovery, including security considerations, planning metrics for success, and correcting errors to keep workflows evergreen.
Work closely with vendors to keep application programming interfaces (APIs) and connectors current, advised Coleman. Provide feedback to vendors to fix errors, provide feature enhancements to meet your growing collection requirements, and ask recurring technical questions to improve workflows and automation. Where possible, use Python scripts to mimic human behavior.
Track metrics to improve the collection process and reduce automation errors. Keep track of the time it takes to collect, the time to image, the time to review, and more. And keep automation errors at or below ten percent, said Coleman.
When you automate collection activities, develop scripts and workflows in controlled environments, and push them to production only after rigorous testing. Scripts have keys to connectors, so limit access to them and monitor their use.
Maintain ancillary data on users and keep a current data map to know what sources they access with which devices. This information should be available to an analyst when custodians’ data is possibly relevant to litigation, investigations, arbitrations, incident response, privacy requests (DSARs), internal audits, and other e-discovery use cases.
“There is no silver bullet for collecting every piece of data,” said Coleman. Email and apps may contain novel document attachments. Social media and collaboration apps like Microsoft Teams and Slack use proprietary threaded messaging with file-sharing capabilities. “Choose the right tool for the task and validate that it collects what you want, including metadata,” said Coleman.
Although Austin and Coleman did not address mobile device data collection directly, many tools are available, but only ModeOne brings a strategic approach to smartphone discovery. Using ModeOne’s automated technology, you can collect smartphone evidence anywhere worldwide, with same-day service, and target only relevant custodian data within a set time frame, ignoring private unrelated files. The remote collection solution, which eliminates the need for a physical collection kit and onsite forensics technicians, uses a patented, secure, SaaS framework that automatically captures, encrypts, and transmits the raw data to an ISO-certified data center where it is ingested and processed, including logging, decryption, and extraction. Processed data is normalized and visually formatted for ease of review with role-based access controls. You can easily search, filter, and export the data into Relativity for expedited review.
Corporations, their attorneys, and services providers enjoy an easy, fast, secure, and cost-effective experience with ModeOne, addressing the personal privacy needs of data custodians, who in turn don’t have to relinquish their phones or stop using them during the collection process.