8 C
New York
Sunday, March 22, 2026

Streamline massive binary object migrations: A Kafka-based answer for Oracle to Amazon Aurora PostgreSQL and Amazon S3


Clients migrating from on-premises Oracle databases to AWS face a problem: effectively relocating massive object knowledge varieties (LOBs) to object storage whereas sustaining knowledge integrity and efficiency. This problem originates from the normal enterprise database design the place LOBs are saved alongside structured knowledge, resulting in storage capability constraints, backup complexity, and efficiency bottlenecks throughout knowledge retrieval and processing. LOBs, which may embrace photos, movies, and different massive recordsdata, usually trigger conventional knowledge migrations to undergo from sluggish speeds and LOB truncation points. These points are significantly problematic for long-running migrations that may span a number of years.

On this submit, we current a scalable answer that makes use of Amazon Managed Streaming for Apache Kafka (Amazon MSK), Amazon Aurora PostgreSQL-Suitable Version, and Amazon MSK Join. The information streaming allows knowledge replication the place modifications are despatched and obtained in a steady circulation, permitting the goal database to entry and apply the modifications in actual time. This answer generates occasions for database actions corresponding to insert, replace, and delete, triggering AWS Lambda features to obtain LOBs from the supply Oracle database and add them to Amazon Easy Storage Service (Amazon S3) buckets. Concurrently, the streaming occasions migrate the structured knowledge from the Oracle database to the goal database whereas sustaining correct linking with their respective LOBs.

The whole implementation is accessible on GitHub, together with AWS Cloud Improvement Equipment (AWS CDK) deployment code, configuration recordsdata, and setup directions.

Answer overview

Though conventional Oracle database migrations deal with structured knowledge successfully, they battle with LOBs that may embrace photos, movies, and paperwork. These migrations usually fail attributable to measurement limitations and truncation points, creating important enterprise dangers, together with knowledge loss, prolonged downtime, and venture delays that may drive you to delay your cloud transformation initiatives. The issue turns into extra acute throughout long-running migrations spanning a number of years, the place sustaining operational continuity is crucial. This answer addresses the important thing challenges of LOB migration, enabling steady, long-term operations with out compromising efficiency or reliability.

By eradicating the scale limitations related to conventional migration applied sciences, our answer offers a strong framework that helps you seamlessly relocate LOBs whereas facilitating knowledge integrity all through the method.

Our strategy makes use of a contemporary streaming structure to alleviate the normal constraints of Oracle LOB migration. The answer contains the next core elements:

  • Amazon MSK – Offers the streaming infrastructure.
  • Amazon MSK Join – Utilizing two connectors:
    • Debezium Connector for Oracle as a supply connector to seize row-level modifications that happen in Oracle database. The connector emits change occasions and publishes to a Kafka supply matter.
    • Debezium Connector for JDBC as a sink connector to eat occasions from Kafka supply matter after which write these occasions to Aurora PostgreSQL-Suitable by utilizing a JDBC driver.
  • Lambda perform – Triggered by an occasion supply mapping to Amazon MSK. The perform processes occasions from the Kafka supply matter, extracting the Oracle row major key from every occasion payload. It makes use of this key to obtain the corresponding BLOB knowledge from the supply Oracle database and uploads it to Amazon S3, organizing recordsdata by major key folders to keep up easy linking with the relational database information.
  • Amazon RDS for Oracle – Amazon Relational Database Service (Amazon RDS) for Oracle is used because the supply database to simulate an on-premises Oracle database.
  • Aurora PostgreSQL-Suitable – Used because the goal database for migrated knowledge.
  • Amazon S3 – Used as object storage for storing the BLOB knowledge from supply database.

The next diagram reveals the Oracle LOB knowledge migration structure answer.

Message circulation

When knowledge modifications happen within the supply Amazon RDS for Oracle database, the answer executes the next sequence, transferring by way of occasion detection and publication, BLOB processing with Lambda, and structured knowledge processing:

  1. The Oracle supply connector captures the change knowledge seize (CDC) occasions, together with the change to BLOB knowledge column. This connector configures the BLOB knowledge column to exclude from the Kafka occasion to optimize the Kafka payload.
  2. The connector publishes this occasion to an MSK matter.
    1. The MSK occasion triggers the BLOB Downloader Lambda perform for the CDC occasions.
      1. The Lambda perform examines two key circumstances: the Debezium occasion code (particularly checking for create (c) or replace(u)) and the configured listing of Oracle BLOB desk names together with their column names. When a Kafka message matches each the configured desk listing and legitimate Debezium occasions, the Lambda perform initiates the BLOB knowledge obtain from the Oracle supply utilizing the first key and desk title; in any other case, the perform bypasses the BLOB obtain course of. This selective strategy makes certain the Lambda perform solely executes SQL queries when processing Kafka messages for tables containing BLOB knowledge, optimizing database interactions.
      2. The Lambda perform uploads the BLOB to Amazon S3, organizing by major key folders with distinctive object names, which allows linking between structured database information and their corresponding BLOB knowledge in Amazon S3.
    2. The PostgreSQL sink connector receives the occasion from the MSK matter.
      1. The connector applies these modifications to the Aurora PostgreSQL database for the Oracle database modifications besides the BLOB knowledge column. The BLOB knowledge column is excluded by the Oracle supply connector.

Key advantages

The answer gives the next key benefits:

  • Value optimization and licensing – Our strategy gives important value optimization advantages by lowering the general measurement of your database and assuaging your want for costly licenses related to conventional databases and replication applied sciences. By decoupling LOB storage from the database and utilizing Amazon S3, you’ll be able to scale back your general database footprint and scale back prices related to conventional licensing and replication applied sciences. The streaming structure additionally minimizes your infrastructure overhead throughout long-running migrations.
  • Avoids measurement constraints and migration failures – Conventional migration instruments usually impose measurement limitations on LOB transfers, resulting in truncation points and failed migrations. This answer removes these constraints solely, so you’ll be able to migrate LOBs of various sizes whereas sustaining knowledge integrity. The event-driven structure allows close to real-time knowledge replication, permitting your supply programs to stay operational throughout migration.
  • Enterprise continuity and operational excellence – Modifications circulation constantly to your goal setting, permitting for enterprise continuity. The answer preserves relationships between structured database information and their corresponding LOBs by way of major key-based group in Amazon S3, permitting for referential integrity whereas offering the flexibleness of object storage for big recordsdata.
  • Architectural benefits – Storing LOBs in Amazon S3 whereas sustaining structured knowledge in Aurora PostgreSQL-Suitable creates a transparent separation. This structure simplifies your backup and restoration operations, improves question efficiency on structured knowledge, and offers versatile entry patterns for binary objects by way of Amazon S3.

Implementation finest practices

Contemplate the next finest practices when implementing this answer:

  • Begin small and scale regularly – To implement this answer, begin with a pilot venture utilizing non-production knowledge to validate your strategy earlier than committing to full-scale migration. This offers you an opportunity to work out points in a managed setting and refine your configuration with out impacting manufacturing programs.
  • Monitoring – Arrange complete monitoring by way of Amazon CloudWatch to trace key metrics like Kafka lag, Lambda perform errors, and replication latency. Set up alerting thresholds early so you’ll be able to catch and resolve points shortly earlier than they impression your migration timeline. Measurement your MSK cluster based mostly on anticipated CDC quantity and configure Lambda reserved concurrency to deal with peak hundreds throughout preliminary knowledge synchronization.
  • Safety – For safety, use encryption in transit and at relaxation for each structured knowledge and LOBs, and comply with the precept of least privilege when establishing AWS Identification and Entry Administration (IAM) roles and insurance policies on your MSK cluster, Lambda features, S3 buckets, and database cases. Doc your schema mappings between Oracle and Aurora PostgreSQL-Suitable, together with how database information hyperlink to their corresponding LOBs in Amazon S3.
  • Testing and preparation – Earlier than you go stay, check your failover and restoration procedures totally. Validate situations like Lambda perform failures, MSK cluster points, and community connectivity issues to make sure you’re ready for potential points. Lastly, keep in mind that this streaming structure maintains eventual consistency between your supply and goal programs, so there is likely to be transient lag occasions throughout high-volume intervals. Plan your cutover technique with this in thoughts.

Limitations and concerns

Though this answer offers a strong strategy for migrating Oracle databases with LOBs to AWS, there are a number of inherent constraints to grasp earlier than implementation.

This answer requires community connectivity between your supply Oracle database and AWS setting. For on-premises Oracle databases, you should set up AWS Direct Join or VPN connectivity earlier than deployment. Community bandwidth instantly impacts replication velocity and general migration efficiency, so your connection should have the ability to deal with the anticipated quantity of CDC occasions and LOB transfers.

The answer makes use of Debezium Connector for Oracle because the supply connector and Debezium Connector for JDBC because the sink connector. This structure is particularly designed on your Oracle-to-PostgreSQL migrations. Different database combos require completely different connector configurations or won’t be supported by the present implementation. Migration throughput can be constrained by your MSK cluster capability and Lambda concurrency limits. You can even exceed AWS service quotas for large-scale migrations and also you may have to request quota will increase by way of AWS Enterprise Assist.

Conclusion

On this submit, we introduced an answer that addresses the crucial problem of migrating your massive binary objects from Oracle to AWS by utilizing a streaming structure that separates LOB storage from structured knowledge. This strategy avoids measurement constraints, reduces Oracle licensing prices, and preserves knowledge integrity all through prolonged migration intervals.

Prepared to rework your Oracle migration technique? Go to the GitHub repository, the place you’ll discover the whole AWS CDK deployment code, configuration recordsdata, and step-by-step directions to get began.


Concerning the authors

Naresh Dhiman

Naresh Dhiman

Naresh is a Sr. Options Architect at AWS supporting US federal prospects. He has over 25 years of expertise as a know-how chief and is a acknowledged inventor with six patents. He focuses on containers, machine studying, and generative AI on AWS.

Archana Sharma

Archana Sharma

Archana is a Sr. Database Specialist Options Architect, working with Worldwide Public Sector prospects. She has years of expertise in relational databases, and is obsessed with serving to prospects of their journey to the AWS Cloud with a concentrate on database migration and modernization.

Ron Kolwitz

Ron Kolwitz

Ron is a Sr. Options Architect supporting US Federal Authorities Sciences prospects together with NASA and the Division of Power. He’s particularly obsessed with aerospace and advancing using GenAI and quantum-based applied sciences for scientific analysis. In his free time, he enjoys spending time along with his household of avid water-skiers.

Karan Lakhwani

Karan Lakhwani

Karan is a Sr. Buyer Options Supervisor at Amazon Internet Companies. He focuses on generative AI applied sciences and is an AWS Golden Jacket recipient. Exterior of labor, Karan enjoys discovering new eating places and snowboarding.

Related Articles

Latest Articles