Description
Rails style database migrations are a useful way to evolve your data schema in
an agile manner. Most Rails projects start like this, and at first, making
changes is fast and easy.
That is until your tables grow to millions of records. At this point, the
locking nature of ALTER TABLE may take your site down for an hour or more
while critical tables are migrated. In order to avoid this, developers begin
to design around the problem by introducing join tables or moving the data
into another layer. Development gets less and less agile as tables grow and
grow. To make the problem worse, adding or changing indices to optimize data
access becomes just as difficult.
Large Hadron Migrator alternatives and similar gems
Based on the "Database Tools" category.
Alternatively, view Large Hadron Migrator alternatives based on common mentions on social networks and blogs.
-
Database Cleaner
Strategies for cleaning databases in Ruby. Can be used to ensure a clean state for testing. -
Seed dump
Rails 4/5 task to dump your data to db/seeds.rb -
Foreigner
Adds foreign key helpers to migrations and correctly dumps foreign keys to schema.rb -
Lol DBA
lol_dba is a small package of rake tasks that scan your application models and displays a list of columns that probably should be indexed. Also, it can generate .sql migration scripts. -
Squasher
Squasher - squash your old migrations in a single command -
Seed Fu
Advanced seed data handling for Rails, combining the best practices of several methods together. -
Active Record Doctor
Identify database issues before they hit production. -
Seedbank
Seedbank gives your seed data a little structure. Create seeds for each environment, share seeds between environments and specify dependencies to load your seeds in order. All nicely integrated with simple rake tasks. -
BatchLoader
:zap: Powerful tool for avoiding N+1 DB or HTTP queries -
SchemaPlus
SchemaPlus provides a collection of enhancements and extensions to ActiveRecord -
Polo
Polo travels through your database and creates sample snapshots so you can work with real world data in development. -
DatabaseValidations
Database validations for ActiveRecord -
Upsert
Upsert on MySQL, PostgreSQL, and SQLite3. Transparently creates functions (UDF) for MySQL and PostgreSQL; on SQLite3, uses INSERT OR IGNORE. -
DatabaseConsistency
The tool to avoid various issues due to inconsistencies and inefficiencies between a database schema and application models. -
SecondBase
Seamless second database integration for Rails. -
OnlineMigrations
Catch unsafe PostgreSQL migrations in development and run them easier in production (code helpers for table/column renaming, changing column type, adding columns with default, background migrations, etc). -
QueryTrack
Find time-consuming database queries for ActiveRecord-based Rails Apps -
FastPage
Blazing fast pagination for ActiveRecord with deferred joins ⚡️ -
Shiba
Catch bad SQL queries before they cause problems in production -
Redis Dashboard
Sinatra app to monitor Redis servers. -
Ruby PG Extras
Ruby PostgreSQL database performance insights. Locks, index usage, buffer cache hit ratios, vacuum stats and more. -
Ruby Whatsapp SDK
Use the Ruby Whatsapp SDK to communicate with Whatsapp API using the Cloud API. Create bots to send and receive messages using the Whatsapp SDK in a few minutes. -
PgDriveBackup
Simple solution to make encrypted with ccrypt PostgreSQL backups and storing on Google Drive API -
Slack Smart Bot
Create a Slack bot that is smart and so easy to expand, create new bots on demand, run ruby code on chat, create shortcuts... The main scope of this gem is to be used internally in the company so teams can create team channels with their own bot to help them on their daily work, almost everything is suitable to be automated!! slack-smart-bot can create bots on demand, create shortcuts, run ruby code... just on a chat channel. You can access it just from your mobile phone if you want and run those tests you forgot to run, get the results, restart a server... no limits. -
ActiveRecord Setops
Union, Intersect, and Difference set operations for ActiveRecord (also, SQL's UnionAll). -
Perfect Shape
Perfect Shape is a collection of geometric algorithms that are mostly useful for GUI manipulation like checking containment of a point in popular geometric shapes such as rectangle, square, arc, circle, polygon, and paths containing lines, quadratic bézier curves, and cubic bezier curves. Also, some general math algorithms like IEEE-754 Remainder. -
PgDice
Postgres partitioning built on top of https://github.com/ankane/pgslice -
ActiveRecord::DataIntegrity
Check data integrity for your ActiveRecord models -
bundler-download
Bundler plugin for auto-downloading specified extra files after gem install -
SlackHook
A Simple Interface to Slack Incoming Webhooks Integrations
Clean code begins in your IDE with SonarLint
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.
Do you think we are missing an alternative of Large Hadron Migrator or a related project?
README
Large Hadron Migrator 
Rails style database migrations are a useful way to evolve your data schema in an agile manner. Most Rails projects start like this, and at first, making changes is fast and easy.
That is until your tables grow to millions of records. At this point, the
locking nature of ALTER TABLE
may take your site down for an hour or more
while critical tables are migrated. In order to avoid this, developers begin
to design around the problem by introducing join tables or moving the data
into another layer. Development gets less and less agile as tables grow and
grow. To make the problem worse, adding or changing indices to optimize data
access becomes just as difficult.
Side effects may include black holes and universe implosion.
There are few things that can be done at the server or engine level. It is
possible to change default values in an ALTER TABLE
without locking the
table. The InnoDB Plugin provides facilities for online index creation, which
is great if you are using this engine, but only solves half the problem.
At SoundCloud we started having migration pains quite a while ago, and after looking around for third party solutions, we decided to create our own. We called it Large Hadron Migrator, and it is a gem for online ActiveRecord migrations.
The Large Hadron collider at CERN
The idea
The basic idea is to perform the migration online while the system is live, without locking the table. In contrast to OAK and the facebook tool, we only use a copy table and triggers.
The Large Hadron is a test driven Ruby solution which can easily be dropped
into an ActiveRecord migration. It presumes a single auto
incremented numerical primary key called id as per the Rails convention. Unlike
the twitter solution, it does not require the presence of an indexed
updated_at
column.
Requirements
Lhm currently only works with MySQL databases and requires an established ActiveRecord connection.
It is compatible and continuously tested with MRI 2.0.x, 2.1.x, ActiveRecord 3.2.x and 4.x (mysql and mysql2 adapters).
Limitations
Due to the Chunker implementation, Lhm requires that the table to migrate has
a single integer numeric key column called id
.
Another note about the Chunker, it performs static sized row copies against the id
column. Therefore sparse assignment of id
can cause performance problems for the
backfills. Typically LHM assumes that id
is an auto_increment
style column.
Installation
Install it via gem install lhm
or add gem "lhm"
to your Gemfile.
Usage
You can invoke Lhm directly from a plain ruby file after connecting ActiveRecord to your mysql instance:
require 'lhm'
ActiveRecord::Base.establish_connection(
:adapter => 'mysql',
:host => '127.0.0.1',
:database => 'lhm'
)
# and migrate
Lhm.change_table :users do |m|
m.add_column :arbitrary, "INT(12)"
m.add_column :locale, "VARCHAR(2) NOT NULL DEFAULT 'en'"
m.add_index [:arbitrary_id, :created_at]
m.ddl("alter table %s add column flag tinyint(1)" % m.name)
end
To use Lhm from an ActiveRecord::Migration in a Rails project, add it to your Gemfile, then invoke as follows:
require 'lhm'
class MigrateUsers < ActiveRecord::Migration
def self.up
Lhm.change_table :users do |m|
m.add_column :arbitrary, "INT(12)"
m.add_index [:arbitrary_id, :created_at]
m.ddl("alter table %s add column flag tinyint(1)" % m.name)
end
end
def self.down
Lhm.change_table :users do |m|
m.remove_index [:arbitrary_id, :created_at]
m.remove_column :arbitrary
end
end
end
Note: Lhm won't delete the old, leftover table. This is on purpose, in order to prevent accidental data loss.
Throttler
Lhm is using a throttle mechanism to read data in your original table.
By default, 40000 rows are read each 0.1 second.
If you want to change that behaviour, you can pass an instance of a throttler with the throttler
option.
In this example, 1000 rows will be read with a 10 seconds delay between each processing:
my_throttler = Lhm::Throttler::Time.new(stride: 1000, delay: 10)
Lhm.change_table :users, throttler: my_throttler do |m|
#
end
SlaveLag Throttler
Lhm uses by default the time throttler, however a better solution is to throttle the copy of the data depending on the time that the slaves are behind. To use the SlaveLag throttler:
Lhm.change_table :users, throttler: :slave_lag_throttler do |m|
#
end
Or to set that as default throttler, use the following (for instance in a Rails initializer):
Lhm.setup_throttler(:slave_lag_throttler)
Table rename strategies
There are two different table rename strategies available: LockedSwitcher and AtomicSwitcher.
The LockedSwitcher strategy locks the table being migrated and issues two ALTER TABLE statements. The AtomicSwitcher uses a single atomic RENAME TABLE query and is the favored solution.
Lhm chooses AtomicSwitcher if no strategy is specified, unless your version of MySQL is affected by binlog bug #39675. If your version is affected, Lhm will raise an error if you don't specify a strategy. You're recommended to use the LockedSwitcher in these cases to avoid replication issues.
To specify the strategy in your migration:
Lhm.change_table :users, :atomic_switch => true do |m|
# ...
end
Limiting the data that is migrated
For instances where you want to limit the data that is migrated to the new table by some conditions, you may tell the migration to filter by a set of conditions:
Lhm.change_table(:sounds) do |m|
m.filter("inner join users on users.`id` = sounds.`user_id` and sounds.`public` = 1")
end
Note that this SQL will be inserted into the copy directly after the "from" statement - so be sure to use inner/outer join syntax and not cross joins. These conditions will not affect the triggers, so any modifications to the table during the run will happen on the new table as well.
Cleaning up after an interrupted Lhm run
If an Lhm migration is interrupted, it may leave behind the temporary tables and/or triggers used in the migration. If the migration is re-started, the unexpected presence of these tables will cause an error.
In this case, Lhm.cleanup
can be used to drop any orphaned Lhm temporary tables or triggers.
To see what Lhm tables/triggers are found:
Lhm.cleanup
To remove any Lhm tables/triggers found:
Lhm.cleanup(:run)
Optionally only remove tables up to a specific Time, if you want to retain previous migrations.
Rails:
Lhm.cleanup(:run, until: 1.day.ago)
Ruby:
Lhm.cleanup(:run, until: Time.now - 86400)
Contributing
First, get set up for local development:
git clone git://github.com/soundcloud/lhm.git
cd lhm
To run the tests, follow the instructions on spec/README.
We'll check out your contribution if you:
- Provide a comprehensive suite of tests for your fork.
- Have a clear and documented rationale for your changes.
- Package these up in a pull request.
We'll do our best to help you out with any contribution issues you may have.
License
The license is included as LICENSE in this directory.
Similar solutions
*Note that all licence references and agreements mentioned in the Large Hadron Migrator README section above
are relevant to that project's source code only.