3.6 C
New York
Wednesday, March 25, 2026

Unifying governance and metadata throughout Amazon SageMaker Unified Studio and Atlan


This publish was cowritten with Satabrata Paul and Karan Singh Thakur from Atlan

On this publish, we present you the right way to unify governance and metadata throughout Amazon SageMaker Unified Studio and Atlan via a complete bidirectional integration. You’ll discover ways to deploy the mandatory Amazon Internet Providers (AWS) infrastructure, configure safe connections, and arrange automated synchronization to take care of constant metadata throughout each platforms.

As organizations scale their information and AI applications, groups typically work throughout distributed instruments resembling governance options for enterprise customers and analytics or machine studying (ML) environments for technical groups. With out tight integration between these methods, metadata turns into fragmented. A single asset can seem below completely different names, documentation would possibly drift out of sync, and governance alerts can turn into inconsistent throughout methods.

To handle these challenges, Atlan, a contemporary information workspace that makes collaboration amongst numerous customers like enterprise, analysts, and engineers simpler, rising effectivity and agility in information tasks, and AWS have constructed a bidirectional integration between Atlan and Amazon SageMaker Unified Studio. This integration creates a steady connection between each environments so each staff throughout the enterprise can work with a single, trusted, and synchronized view of metadata for his or her information and AI property. By bridging the hole between numerous customers collaborating in Atlan and technical groups working inside Amazon SageMaker Unified Studio for analytics and ML, this integration maintains consistency throughout each platforms with out requiring groups to modify contexts or manually reconcile metadata variations.

Why unified metadata governance issues

Enterprises as we speak function in hybrid environments. Enterprise customers depend on Atlan as an energetic metadata answer to handle, govern, and collaborate on information property throughout the trendy information stack. Atlan helps groups discover, perceive, and belief their information to allow them to use it successfully to drive enterprise outcomes.

Organizations additionally use Amazon SageMaker Catalog to simplify the invention, governance, and collaboration for each enterprise and technical information throughout structured and unstructured sources. Groups can use the catalog to prepare information merchandise, seize context, and apply governance insurance policies persistently inside Amazon SageMaker Unified Studio.

This new integration synchronizes metadata between SageMaker Catalog and Atlan, sustaining consistency and holding content material present throughout each environments. With a unified view, each staff throughout the enterprise can work confidently with a single, trusted illustration of their information and AI property.

Answer overview

The answer follows a phased rollout technique to give you instant worth whereas progressively increasing towards complete information and AI governance capabilities. The present part focuses on establishing safe, scalable, and dependable metadata synchronization between Atlan and Amazon SageMaker Unified Studio.

The Section 1 integration between Amazon SageMaker Catalog and Atlan permits each on-demand and scheduled bidirectional metadata synchronization throughout the 2 options. It makes use of the usual APIs of Amazon SageMaker Unified Studio and Atlan to create a scalable and configurable mechanism for metadata alternate. Key capabilities embody:

  • Safe connection utilizing IAM roles – The mixing is established via a managed AWS Id and Entry Administration (IAM) primarily based handshake. A predefined AWS CloudFormation template robotically provisions the IAM function and insurance policies required to allow a safe, least-privilege connection between Amazon SageMaker Catalog and the Atlan utility.
  • On-demand and scheduled synchronization – The mixing helps each handbook and automatic metadata synchronization. API-driven workflows handle the alternate of glossary phrases, asset descriptions, and classifications in each instructions, holding metadata constant throughout methods.

After you’ve carried out Section 1, you may carry out bidirectional synchronization of glossary phrases and descriptions between Amazon SageMaker Unified Studio and Atlan. This retains your terminology constant throughout each platforms, and your groups can preserve a single supply of reality for enterprise definitions. The mixing additionally preserves your glossary constructions, together with parent-child relationships, so your rigorously organized taxonomy stays intact in the course of the sync course of. Moreover, glossary phrases are robotically related to associated information property, saving you the handbook effort of linking phrases to the suitable datasets and lowering the danger of inconsistencies.

Past glossary administration, Section 1 permits complete ingestion of property and metadata from Amazon SageMaker Unified Studio into Atlan. This contains your tasks, each revealed and subscribed property, domains and information merchandise, glossaries and phrases, metadata types, and column descriptions. By bringing this data into Atlan, you create a unified view of your information panorama that makes it simpler for information shoppers to find, perceive, and belief the info they’re working with.

Conditions

To comply with together with this integration setup, you could have the next assets already configured in your atmosphere:

  • An Atlan tenant
  • A Node group IAM function
  • An Amazon SageMaker Unified Studio area.
  • A minimum of one Amazon SageMaker Unified Studio challenge with property created and glossary phrases outlined.
  • Atlan API Token. You possibly can generate this by navigating to API entry below the Atlan’s Admin middle.
  • Atlan top-level glossary. You possibly can create this glossary container on Atlan to ingest SageMaker Unified Studio glossaries and phrases.

The following part presents a step-by-step walkthrough of the mixing, from preliminary setup to full operation. It demonstrates how one can set up the belief handshake between Amazon SageMaker Unified Studio and Atlan and the way bidirectional synchronization capabilities in apply.

Setup on AWS

To start the mixing, you want Atlan’s Account Node Occasion IAM function. This function permits the Atlan SageMaker Unified Studio utility to securely assume the IAM function that you’ll create in your AWS account utilizing an AWS CloudFormation template. The belief relationship between these two roles authorizes Atlan to publish metadata to Amazon SageMaker Catalog and to carry out reverse synchronization from AWS again into Atlan.

The IAM coverage follows the precept of least privilege, granting Atlan entry solely to the assets needed for cataloging and governance. This method maintains correct metadata synchronization whereas preserving your present cloud safety and compliance controls.

Observe AWS greatest practices when configuring belief relationships. These cross-account entry mechanisms require cautious administration and monitoring, notably throughout safety incidents. For complete steering on securing IAM roles and belief insurance policies, check with the Safety greatest practices in IAM and Require workloads to make use of non permanent credentials with IAM roles to entry AWS.

Contact your Atlan administrator to acquire the Amazon Useful resource Title (ARN) of the Atlan Account Node Occasion IAM function. You’ll need this worth when configuring the CloudFormation stack in AWS.

The following step is to create an AWS IAM function utilizing the supplied CloudFormation template. This function establishes the belief relationship between your Amazon SageMaker Unified Studio atmosphere and your Atlan tenant. Observe these steps:

  1. Entry the CloudFormation template. The CloudFormation template is at present obtainable as a YAML file.
  2. On the AWS Administration Console, navigate to CloudFormation and select Create stack, then select With new assets (normal), as proven within the following screenshot.
  3. Select the supplied CloudFormation template and select Subsequent.

  4. Enter a reputation for the stack and full the required parameters, as proven within the following screenshot:
    1. AtlanNodeInstanceRoleArn – The ARN of the Atlan node occasion function.
    2. SMUSDomainId – The distinctive identifier for the SageMaker Unified Studio area.
    3. SMUSProjectsToSync – The challenge IDs the place SageMaker Unified Studio and Atlan synchronization might be enabled. You possibly can select to both add the challenge IDs and hold updating this stack each time a Venture is added or add the created IAM function to every challenge as proprietor.

  5. Choose the acknowledgement checkbox and select Subsequent, as proven within the following screenshot.

  6. Select Submit to start out the stack deployment. When the method is full, the stack standing will replace to CREATE_COMPLETE.
  7. Notice the IAM function ARN
  8. After the CloudFormation stack has been deployed and the IAM function has been created, copy the IAM Function ARN from the CloudFormation output. You’ll need this worth in the course of the configuration course of on the Atlan facet to ascertain the safe connection between your Amazon SageMaker Unified Studio atmosphere and your Atlan tenant.

Setup on Atlan

Now that you simply’ve deployed the mandatory AWS assets, you’ll configure Atlan to ascertain the reference to Amazon SageMaker Unified Studio. This entails establishing the API token, configuring the IAM function, and creating the glossary container that can obtain your synchronized metadata. Observe these steps:

  1. Check in to your Atlan tenant, as proven within the following screenshot.

  2. On the New dropdown menu, select New workflow.

  3. On the Market tab, seek for and choose the AWS SageMaker Unified Studio app, as proven within the following screenshot.

  4. Enter credential particulars. Use the IAM function or person created by the CloudFormation template earlier than, enter an API token, and select your AWS Area, as proven within the following screenshot.

  5. Enter connection particulars. In Connection identify, enter a reputation. Beneath Connection Admins, select the plus icon so as to add members (different customers) to the connectors as admins. Assigning admin permissions to the connection permits these customers to:
    1. View and edit the property within the connection.
    2. Edit connection preferences.
    3. Edit persona-based insurance policies for the connection.

  6. Select metadata filters and preflight checks, as proven within the following screenshot:
    • Within the Choose Glossary to counterpoint dropdown menu, select the glossary container in Atlan to be enriched with glossaries and phrases from Atlan.
    • To verify for needed permissions required to run the workflow, choose Fast check for needed permissions earlier than workflow run.
    • To run the workflow, select Run. To schedule it to run later, select Schedule & Run.

Synchronization of metadata

Now that you simply’ve configured the mixing between Atlan and Amazon SageMaker Unified Studio, let’s discover how metadata flows bidirectionally between each platforms to take care of consistency and governance throughout your information panorama.

The Atlan SageMaker Unified Studio connector makes use of a bidirectional synchronization mannequin that retains enterprise context and technical metadata constant throughout each options. The method delivers reliability, traceability, and governance-safe updates, no matter the place adjustments originate. The next diagram illustrates the answer structure.

Sequential workflow for the SageMaker Unified Studio Atlan integration

The mixing between SageMaker Unified Studio and Atlan follows a rigorously orchestrated sequential workflow that allows seamless metadata synchronization throughout each platforms.

The method begins with connection setup via IAM, the place authentication and authorization are configured to ascertain safe entry between the client’s AWS account and Atlan’s AWS atmosphere. This foundational safety layer permits subsequent information exchanges to happen inside a trusted framework.

After the connection is established, the metadata sync workflow will be triggered both on an outlined schedule or manually by the person, offering flexibility primarily based on organizational wants. When triggered, the Atlan SageMaker Unified Studio app calls the SageMaker Unified Studio APIs to ingest property and metadata from the supply system.

The ingested property then bear processing and transformation inside Atlan, the place they’re transformed into Atlan’s metadata mannequin. This processing step is essential as a result of it makes the property discoverable, searchable, and governable contained in the Atlan platform, which suggests groups can use Atlan’s full governance capabilities.

A key functionality of this integration is its real-time reverse sync for metadata updates. When a person modifies metadata for the property inside Atlan (resembling including tags or updating descriptions), Atlan’s real-time reverse sync pipelines instantly detect these adjustments and push the updates again to SageMaker Unified Studio. This retains SageMaker Unified Studio reflecting essentially the most up-to-date metadata entered by customers in Atlan, eliminating the danger of metadata drift between methods.

This bidirectional sync creates a steady loop the place metadata flows from SageMaker Unified Studio to Atlan for ingestion and publication, concurrently flowing again from Atlan to SageMaker Unified Studio via real-time reverse sync. The result’s a constant, bidirectional metadata movement that retains each platforms synchronized. Groups can work confidently understanding that their metadata governance efforts are mirrored throughout their information.

The next diagram illustrates this whole workflow, exhibiting how metadata strikes via every stage of the mixing from preliminary IAM authentication via the continual bidirectional sync loop that maintains metadata consistency throughout each platforms.

SageMaker Unified Studio to Atlan: Ingestion of metadata

The Atlan-SageMaker Unified Studio App periodically connects to SageMaker Unified Studio utilizing safe API calls to ingest metadata. This metadata is reworked and mapped into Atlan’s metadata mannequin, then revealed via the Atlan publish app as new or up to date property.

Every ingestion cycle is totally logged by Atlan’s audit service, which captures timestamps, correlation IDs, and the total change report. These logs help deduplication, troubleshooting, and replay within the occasion of partial failures.

Atlan to SageMaker Unified Studio: Synchronizing enriched enterprise context

When customers enrich property inside Atlan, for instance by updating descriptions or attaching glossary phrases, the mixing detects these adjustments and selectively pushes them again to SageMaker Unified Studio.

The reverse sync management airplane is a pipeline that robotically detects adjustments made to property after which triggers SageMaker Unified Studio Replace API calls within the background to maintain every little thing synchronized.

What’s subsequent?

Section 1 delivers core metadata synchronization and principal catalog choice for instant consistency throughout your information governance platforms. Section 2 will synchronize lineage and information high quality, so groups see the identical information flows and high quality alerts in each Atlan and SageMaker Catalog, enabling end-to-end visibility into how information strikes via your pipelines and sustaining high quality metrics persistently tracked throughout each methods. Section 3 will add built-in approval workflows to streamline how entry is requested and granted throughout options, lowering friction for information shoppers whereas sustaining sturdy governance controls. These upcoming phases construct towards a totally related governance expertise, holding metadata, lineage, high quality, and entry insurance policies aligned throughout the trendy information stack.

Cleanup

For those who now not want the SageMaker Unified Studio connector integration, full the next steps to scrub up your atmosphere and keep away from unintended useful resource utilization:

  1. Delete the CloudFormation stack. Navigate to the AWS CloudFormation console, find the stack deployed for this answer, and select Delete. This motion removes the AWS assets provisioned by the stack, together with IAM roles, insurance policies, and supporting elements.
  2. Take away the connection in Atlan. Go to Delete a connection to comply with the steps outlined in Atlan’s documentation to delete the related connection.

Cleansing up these elements retains your AWS and Atlan environments streamlined, safe, and cost-efficient.

Conclusion

On this publish, you discovered the right way to set up a bidirectional integration between Atlan and Amazon SageMaker Unified Studio that unifies metadata governance throughout your information and AI environments. You walked via deploying the mandatory AWS infrastructure utilizing CloudFormation, configuring the safe IAM primarily based connection, and establishing bidirectional synchronization to maintain glossary phrases, descriptions, and governance context aligned throughout each platforms.

Organizations can use this integration to attach enterprise and technical customers inside a single governance framework, making a constant, trusted view of information throughout the enterprise. With one safe configuration, groups can synchronize metadata between Atlan and Amazon SageMaker Unified Studio, establishing a dependable basis for innovation, collaboration, and accountable AI at scale.


Concerning the authors

Karan Singh Thakur

Karan is a Senior Product Supervisor at Atlan, main the technique and execution for deep hyperscaler integrations, particularly throughout AWS. Earlier than Atlan, Karan spent over a decade constructing cloud-based, data-intensive environments, together with serving because the founding PM for a totally managed lakehouse engine and main enterprise analytics, governance, and Kubernetes-based workload methods.

Satabrata Paul

Satabrata Paul

Satabrata is a Senior Software program Engineer on Atlan’s Metadata Market staff, the place he designs and scales backend methods and CI/CD workflows for high-quality metadata connector integrations. Targeted on fashionable information environments, he helps groups streamline asset discovery, lineage, and cataloging throughout complicated environments.

Divij Bhatia

Divij Bhatia

Divij is a Software program Growth Engineer at Amazon Internet Providers (AWS). He’s captivated with constructing resilient and scalable cloud-based options that remedy real-world issues for purchasers. His free time typically takes him outside, touring and taking pictures landscapes.

Leonardo Gomez

Leonardo Gomez

Leonardo is a Principal Analytics Specialist Options Architect at Amazon Internet Providers (AWS). He has over a decade of expertise in information administration, serving to prospects across the globe tackle their enterprise and technical wants.

Related Articles

Latest Articles