Blog

Google Cloud FileStore and Object Storage Best Practices

Introduction

Let's talk about storage on Google Cloud. Let's talk about the small things that make a big difference. This article isn't about comparing Google Cloud Filestore and Google Cloud Storage. If you're at the point of choosing between them, you're past reading blog posts. This article is about the subtle ways we misunderstand the nature of data in the cloud. It's about the assumptions we make, the myths we tell ourselves, the dogmas we group-think, and the simple truths that often get overlooked.

When we talk about Object Storage, specifically Cloud Storage on GCP, we're referring to a reliable and scalable vault for massive amounts of data. If you are creating the next Dropbox or Spotify, you may want to start with a backend GCS bucket to start storing files. It is cheap, and abstracted, and flexible.

Then there's Filestore, GCP's answer to traditional file systems. Think of it as the file-friendly powerhouse. It's not just about storing files; it's about doing so with the performance that high-demand applications—like databases, virtual machines, and GKE deployments—require. Filestore shines when you need fast, file-based access without compromising on performance.

Avoiding Fine Grained Permissions In GCS

Object-level permissions seem powerful, but they quickly become a tangled web of inconsistencies. This leads to security vulnerabilities and management headaches. We find that for many of our clients, embracing uniform bucket-level access as the default is a good best practice to adopt. This means defining clear IAM roles for different user groups. Google strongly encourages the use of uniform access controls, particularly for buckets containing highly sensitive data (Source). When you do need granular control, Google advocates for alternative methods to manage permissions effectively without relying on fine-grained ACLs. One method is signed URLS. Another recommendation is sometimes the use of managed folders within buckets. By creating a separate managed folder for each customer, you can assign specific IAM policies to these folders. This method of segmenting access at the folder level, rather than at individual objects, aligns with Google's vision of using roles and uniform bucket-level access to streamline and secure access.

In cases where even managed folders and signed URLs don't provide the necessary control, it’s worth exploring options for implementing security mechanisms external to bucket storage. For example, if you're responsible for a mobile or web application using Firebase, Google recommends using supplementary tools like Firebase Security Rules. These provide the fine-grained control you need while avoiding the limitations of ACLs.

Furthermore, Google views ACLs as a legacy system, prone to creating security gaps when used in conjunction with IAM (Source). The two systems do not interoperate predictably—ACLs can grant permissions that IAM policies do not recognize, potentially leading to unintended public access to private data. For example, if a bucket's IAM policy restricts access to a select group, but an object within that bucket has an ACL setting it to public, the object will still be accessible to anyone. This disjointed behavior underscores why uniform bucket-level access is generally safer and more reliable.

Additionally, the following features are not available when working with ACLs: managed folders, IAM Conditions, domain restricted sharing, and workforce identity federation.

Ultimately, while it's possible to manage the complexity introduced by fine-grained ACLs through rigorous coding and policy enforcement, Google's best practices guide users towards simpler, more unified control mechanisms. These practices not only enhance security but also simplify management and compliance auditing, making them a prudent choice for any organization handling sensitive data.

Object storage is a Zebra

Object storage is a conceptual ghost. A phantom. A specter. You may have recursively copied directories in blob storage, but object storage has no directories. Your blob may be on a “path”, but that is really just a name with some slashes in it. It is an illusion. Object storage is an abstraction.The directory structure is syntactic sugar on top of that abstraction, but is not part of its core concept.

This section will explore the fundamental differences between object storage and traditional file systems, particularly how these differences manifest when using tools like Cloud Storage FUSE. We'll examine the limitations and quirks of object storage, highlighting why it should be approached with a different mindset than conventional file systems.

Cloud Storage FUSE is an open source FUSE adapter that lets you mount and access Cloud Storage buckets as local file systems. It is used to bridge the gap between that abstraction and the functionality of a filesystem.

FUSE stands for Filesystem in Userspace.

I say this from experience: GCS FUSE has its limits. GCS FUSE is a bridge, not a magic wand. While FUSE offers incredibly useful file system familiarity, it's not a one-to-one replacement. Concurrent writes, hard links, file locking – these behave differently in the world of FUSE. Use it strategically for tasks that benefit from file system access, but don't expect it to magically transform object storage into something it's not. It does not offer POSIX permissions on managed folders (which are anything in the object that was added prior to mounting with fuse), hard links, File locking, or overwrites in place. It also won’t get latency any closer to the speedier latency of a local filesystem. (Object storage has high relative latency, so can’t for example be used as the backend for a transactional database. GCS Fuse makes no claim to ameliorate that.)

Not only is GCS FUSE a shadow of POSIX, e.g. offering soft links but not hard links, but also note that GCS Fuse is not compatible with certain behaviors that are uniquely offered by Object Storage. For example, Object versioning: Cloud Storage FUSE does not formally support usage with buckets that have object versioning enabled. Attempting to use Cloud Storage FUSE with buckets that have object versioning enabled can produce unpredictable behavior. The same goes for mounting buckets with lifecycle policies.

With GCS Fuse, you can mount a bucket or subdirectory of a bucket as a pretend file system mount. You can mount, but not tame, object storage. Which is also true of the noble and rambunctious Zebra!

Object storage: now with hierarchical namespaces!

As of June 2024, when creating buckets, you can set the never-before-seen property hierarchical_namespace_enabled to true. And suddenly, the object storage abstraction will have directories. This feature brings an intuitive, hierarchical structure to your data, making it easier to organize and manage large datasets, especially for teams familiar with traditional file systems. The ability to navigate through a directory-like structure improves accessibility and simplifies the transition from file-based systems to object storage. While it has its limitations and is still new, hierarchical namespaces have the potential to bridge the gap between file-based workflows and the scalability of object storage.

Yet, even this newly added feature—promising as it is—comes with constraints. For starters, hierarchical namespaces can only be enabled when creating a new bucket, meaning that existing buckets with flat namespaces cannot be retrofitted with this feature. Additionally, several key functionalities commonly available in standard object storage are absent in this preview. Soft deletions, which allow you to recover accidentally deleted files, are not supported. Autoclassing, a feature that automates the movement of data between storage classes, is also unavailable. Object versioning, which is vital for tracking changes to files and rolling back to previous versions, is off the table, as are object-level ACLs (access control lists) that provide fine-grained permissions. Furthermore, tools like object retention locks and bucket locks, which help safeguard data from being deleted or altered, are not compatible with hierarchical namespaces.

Moreover, while hierarchical namespace-enabled buckets can be viewed in the Google Cloud console, managing their folder structures through the console is not supported. For that, you'll need to rely on the command line, REST APIs, or client libraries for folder management tasks.

Like Cloud Storage FUSE, hierarchical namespaces attempt to lend the familiar comforts of file systems to object storage, but they are ultimately an abstraction. They bring order and simplicity to cloud storage, yet remind us—just like the zebra—that taming a wild system always comes with limitations.

Intelligent Backup and Snapshot Management:

Embrace automated backups. Think of them as low-cost insurance for your data. Regularly backup your filestore directories to a different region. It's not about fearing disaster, it's about respecting its possibility. And don't just back up, strategize: Prioritize critical data and use managed solutions for granular recovery.

With filestore, this means backups and snapshots. With Object storage, this means Object Versioning, which creates a history of all changes, allowing you to restore previous versions if needed. Safeguard against overwrites that result in data quality regressions, enabling straightforward data rollback in your data lake to align with rolling back a corresponding code change.

Test your backups. A filestore backup is only as good as its restoration. Regularly test restoring backups, like a dress rehearsal for a potential data disaster.

There is no such thing as a free lunch

Data access patterns will change from quarter to quarter and year to year. Therefore, your storage strategy must be equally adaptable.

Consider the following triad of choices:

First, you can harness the power of tools like Google Cloud's Storage Insights and commit to meticulously analyzing your usage trends on a regular basis, and dynamically adjust storage classes based on the analysis. With this option, you pay a little for the tool, but spend precious time on continual administration, a distraction from your business. These tools aren’t so expensive, and the added observability can inform other business decisions. But the periodic manual analyses are the cost to weigh, when considering this choice.

The second option would be to consider leveraging the Autoclass feature to automatically transition objects between storage classes. This can significantly optimize your costs by moving less frequently accessed data to cheaper storage options. This is a fine option because you don’t have to think about it. If it lets you focus on your business, it’s often a good idea.

Lastly, if the potential costs of a mismatch between bucket configuration and evolving data access patterns are minimal, it might be more practical to simply forgo these automated policies and tools altogether. However, remember that this decision implies a tacit acceptance of a less optimized system, one that may not respond adeptly to shifting access patterns.One must be cognizant that the decision to eschew automation in favor of simplicity comes with its own set of trade-offs. The balance between effort, cost, and optimization should be deliberate, and your choices should be made with an understanding of their long-term implications.

Data, like life, is rarely static. What's frequently accessed today might become tomorrow's archive. Whether or not you are optimized perfectly depends on whether you are cash poor, time strapped, or unflappable.

Labeling and tagging: Not just for people with too much time on their hands

Tagging and labeling in Google Cloud aren’t just about organizing resources; they are powerful strategic tools. They are useful for cost analysis, resource management, and automation. Most importantly, they create a culture of orderly data practices. Sweeping in front of the store makes a big impression on your employees.

Look, we get it. You are moving fast. You have an up-and-coming startup. Who has time for labeling and tagging? Not you. Once your venture explodes, you’ll hire some coders to handle it. Or maybe, by then, an AI service will generate those tags. Either way, you’ll be on a beach somewhere. But even so, you should still treat tags and labels as first-class citizens in your cloud deployment.

Implementing tagging and labeling early on can enhance efficiency and security as your cloud environment scales. While it may seem like extra work at first, these practices enable the generation of detailed cost reports by team, environment, or project phase and make it easier to automate the application of security policies based on data sensitivity.

Now that you know the impact, here’s how you can leverage these tools. Google Cloud provides two main options for organizing your Cloud Storage resources: Tags and labels. Tags are key-value pairs applied at the organization or project level, offering fine-grained access control and detailed cost analysis. Managed through the Resource Manager, tags can be integrated into IAM policies to enforce conditional access.

Bucket Labels, on the other hand, are simpler key-value pairs stored as part of a bucket's metadata. Bucket Labels are used for basic organization. They are easier to apply and manage. In essence, use Tags when you need control and integration, and rely on Labels for simplicity.

To harness their full power, start with a simple and consistent naming convention for your buckets and objects. Use labels to categorize resources by team, environment, and purpose, setting the stage for future automation and insights.

In our experience, now is the time to start treating tags and labels as first-class citizens—or at least economy-plus class citizens. They deserve the legroom.

Your Storage Strategy: A Long-Term Investment in Data Health

In the end, your storage strategy is more than a technical decision—it's a commitment to the ongoing health of your data ecosystem. By focusing on best practices today, you ensure that your cloud infrastructure remains resilient, efficient, and ready to adapt to whatever challenges tomorrow may bring.

Next steps

Ready to talk about your next project?

1

Tell us more about your custom needs.

2

We’ll get back to you, really fast

3

Kick-off meeting

Let's Talk