How we migrated 50TB of relational data using DMS part 3

Miguel Angel Fernandez-Nieto
Engineering at FundApps
4 min readMar 8, 2024

--

Post-Migration Tasks: Denormalising Data

At its core, denormalization is the process of reducing the level of normalization in a database. In a normalized database, data is organized into multiple related tables to minimize data redundancy and improve data integrity. However, in some scenarios, particularly when dealing with complex queries or reporting requirements, normalized data structures can become a bottleneck for performance. This is where denormalization comes into play. Let’s consider a real-world SQL query that joins multiple tables.

In this example, we have two tables: dbo.result anddbo.portfoliosnapshot. The dbo.result table contains various results, while the dbo.portfoliosnapshot table holds data date information.

The SQL query in question is an INSERT INTO...SELECT statement that populates a new table, dbo.result_denormalised_datadate, by selecting data from the two source tables and joining them based on a common key, portfoliosnapshotid.

In the example provided, data from multiple tables is consolidated into a single table, making it easier and faster to retrieve the required information.

To achieve data denormalization, we leveraged our previously discussed infrastructure known as ANext Data Migration. We utilized this infrastructure to enqueue a message in the SQS queue, which triggered the execution of an SQL query by an ECS (Elastic Container Service).

Post-Migration Tasks: Enabling Constraints and Synchronizing Sequences

With the data migrated, we needed to perform a few tasks. First and foremost, we needed to re-enable the constraints to maintain data consistency. Fortunately, we had already implemented the task to disable the constraints, so adapting it to either disable or enable them was straightforward. It was simply a matter of configuring an SQS queue to queue a message with a specific payload, allowing us to catch it in an ECS container.

So, once we re-enabled SQL constraints like PK, FK, etc., we were ready to switch the application to use Postgres. However, before doing that, we needed to sync sequences. Since we had moved a few auto-incremental values, we had to synchronize the sequences across all the databases. We also adapted the last script to process another payload, which would execute the query to sync sequences.

Post-Migration Tasks: Data Validation

Once we we had all the data migrated, we wanted to make sure that the data that we had migrated was properly migrated, and that there was nothing wrong that could end up in inconsistent data. AWS DMS has an option to validate data, however, it was not working as expected due to the data type we are using and how Postgres stores compressed JSON. It’s not an issue with DMS, in some ways, it’s a feature of Postgres. The way that we ended up doing was to deploy another task using Terraform, (as the rest of the task), but in this case, we were using the ValidationSettings property, to not migrate that itself, but to validate the data that was already copied. As we mentioned before, we managed to get it validated in almost all tables, except a few of them, and this was because we went from varchar in SQL Server to jsonb in Postgres which didn’t work. In this case, we compared the number of rows for both source and target.

Switching to Postgres: Application Deployment

Now, it’s time to switch the application. We have a tenant catalog that allows us to properly configure different feature flags. This feature flag listens for changes during the deployment to configure the application to use either Postgres or the old SQL Server clusters. Once we change the feature flag, we can perform a standard application deployment using Octopus to release a new version with this feature flag enabled.

Dealing with the downtime

This comprehensive resource delves into the specifics of our workload, shedding light on our capacity to gracefully manage a short period of downtime.

To maintain the consistency and integrity of our services, we decided to temporarily suspend our operations during the execution of the second DMS task aimed at migrating data beyond the specified timestamp.

Our service was seamlessly reactivated as part of the Octopus deployment, further demonstrating our dedication to minimizing disruptions and enhancing overall system performance. This decision was made after careful consideration of both operational requirements and our commitment to delivering a reliable service experience for our users.

Conclusion

Our journey of migrating 50TB of relational data to Amazon Aurora Serverless V2 using Amazon DMS was a testament to our resilience, innovation, and commitment to delivering exceptional services. Through meticulous planning, collaborative efforts, and strategic utilization of technology, we successfully navigated through challenges and emerged stronger than ever. From the initial identification of scalability issues with SQL Server clusters to the implementation of a comprehensive migration strategy, every step of the process was marked by dedication and perseverance. As we reflect on this transformative experience, we celebrate our achievements and embrace the valuable lessons learned along the way. Moving forward, we remain steadfast in our pursuit of innovation, continuously striving to empower compliance teams worldwide and drive positive change within the industry. Our journey doesn’t end here; it serves as a foundation for future endeavors, inspiring us to push boundaries, overcome obstacles, and create a brighter, more efficient future for all.

--

--