Programing

Django 1.7 및 데이터 마이그레이션으로 초기 데이터로드

crosscheck 2020. 9. 4. 06:54
반응형

Django 1.7 및 데이터 마이그레이션으로 초기 데이터로드


최근에 Django 1.6에서 1.7로 전환했고 마이그레이션을 사용하기 시작했습니다 (South는 사용하지 않았습니다).

1.7 이전 에는 (데이터베이스를 만들 때) 명령 fixture/initial_data.json으로로드 된 파일로 초기 데이터를 로드했습니다 python manage.py syncdb.

이제 마이그레이션을 사용하기 시작했으며이 동작은 더 이상 사용되지 않습니다.

응용 프로그램이 마이그레이션을 사용하는 경우 고정 장치가 자동으로로드되지 않습니다. Django 2.0의 애플리케이션에는 마이그레이션이 필요하므로이 동작은 더 이상 사용되지 않는 것으로 간주됩니다. 앱에 대한 초기 데이터를로드하려면 데이터 마이그레이션에서 수행하는 것이 좋습니다. ( https://docs.djangoproject.com/en/1.7/howto/initial-data/#automatically-loading-initial-data-fixtures )

공식 문서는 내 질문은, 그래서 그것을 수행하는 방법에 대한 명확한 예를 가지고 있지 않습니다

데이터 마이그레이션을 사용하여 이러한 초기 데이터를 가져 오는 가장 좋은 방법은 무엇입니까?

  1. mymodel.create(...),에 대한 여러 호출로 Python 코드 작성
  2. Django 함수 ( 예 : 호출loaddata )를 사용하거나 작성 하여 JSON 픽스처 파일에서 데이터를로드합니다.

두 번째 옵션을 선호합니다.

Django가 이제 기본적으로 할 수있는 것처럼 보이기 때문에 South를 사용하고 싶지 않습니다.


업데이트 :이 솔루션으로 인해 발생할 수있는 문제에 대해서는 아래 @GwynBleidD의 의견을 참조하고 향후 모델 변경에 더 내구성이있는 접근 방식은 아래 @Rockallite의 답변을 참조하십시오.


픽스쳐 파일이 있다고 가정합니다. <yourapp>/fixtures/initial_data.json

  1. 빈 마이그레이션을 만듭니다.

    Django 1.7에서 :

    python manage.py makemigrations --empty <yourapp>
    

    Django 1.8 이상에서는 이름을 제공 할 수 있습니다.

    python manage.py makemigrations --empty <yourapp> --name load_intial_data
    
  2. 마이그레이션 파일 편집 <yourapp>/migrations/0002_auto_xxx.py

    2.1. Django에서 영감을 얻은 맞춤형 구현 ' loaddata(초기 답변) :

    import os
    from sys import path
    from django.core import serializers
    
    fixture_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), '../fixtures'))
    fixture_filename = 'initial_data.json'
    
    def load_fixture(apps, schema_editor):
        fixture_file = os.path.join(fixture_dir, fixture_filename)
    
        fixture = open(fixture_file, 'rb')
        objects = serializers.deserialize('json', fixture, ignorenonexistent=True)
        for obj in objects:
            obj.save()
        fixture.close()
    
    def unload_fixture(apps, schema_editor):
        "Brutally deleting all entries for this model..."
    
        MyModel = apps.get_model("yourapp", "ModelName")
        MyModel.objects.all().delete()
    
    class Migration(migrations.Migration):  
    
        dependencies = [
            ('yourapp', '0001_initial'),
        ]
    
        operations = [
            migrations.RunPython(load_fixture, reverse_code=unload_fixture),
        ]
    

    2.2. 더 간단한 솔루션 load_fixture(@juliocesar의 제안에 따라) :

    from django.core.management import call_command
    
    fixture_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), '../fixtures'))
    fixture_filename = 'initial_data.json'
    
    def load_fixture(apps, schema_editor):
        fixture_file = os.path.join(fixture_dir, fixture_filename)
        call_command('loaddata', fixture_file) 
    

    사용자 지정 디렉터리를 사용하려는 경우 유용합니다.

    2.3. 가장 간단한 방법 :loaddata with app_label호출 하면 <yourapp>fixtures디렉토리 에서 조명기가 자동으로 로드 됩니다.

    from django.core.management import call_command
    
    fixture = 'initial_data'
    
    def load_fixture(apps, schema_editor):
        call_command('loaddata', fixture, app_label='yourapp') 
    

    을 지정하지 않으면 app_labelloaddata는 모든 앱 조명기 디렉터리 fixture에서 파일 이름 을로드하려고 시도합니다 (원하지 않을 수도 있음).

  3. 실행

    python manage.py migrate <yourapp>
    

짧은 버전

당신은해야 하지 사용하는 loaddata데이터 마이그레이션에서 직접 관리 명령을 사용합니다.

# Bad example for a data migration
from django.db import migrations
from django.core.management import call_command


def load_fixture(apps, schema_editor):
    # No, it's wrong. DON'T DO THIS!
    call_command('loaddata', 'your_data.json', app_label='yourapp')


class Migration(migrations.Migration):
    dependencies = [
        # Dependencies to other migrations
    ]

    operations = [
        migrations.RunPython(load_fixture),
    ]

긴 버전

loaddata utilizes django.core.serializers.python.Deserializer which uses the most up-to-date models to deserialize historical data in a migration. That's incorrect behavior.

For example, supposed that there is a data migration which utilizes loaddata management command to load data from a fixture, and it's already applied on your development environment.

Later, you decide to add a new required field to the corresponding model, so you do it and make a new migration against your updated model (and possibly provide a one-off value to the new field when ./manage.py makemigrations prompts you).

You run the next migration, and all is well.

Finally, you're done developing your Django application, and you deploy it on the production server. Now it's time for you to run the whole migrations from scratch on the production environment.

However, the data migration fails. That's because the deserialized model from loaddata command, which represents the current code, can't be saved with empty data for the new required field you added. The original fixture lacks necessary data for it!

But even if you update the fixture with required data for the new field, the data migration still fails. When the data migration is running, the next migration which adds the corresponding column to the database, is not applied yet. You can't save data to a column which does not exist!

Conclusion: in a data migration, the loaddata command introduces potential inconsistency between the model and the database. You should definitely NOT use it directly in a data migration.

The Solution

loaddata command relies on django.core.serializers.python._get_model function to get the corresponding model from a fixture, which will return the most up-to-date version of a model. We need to monkey-patch it so it gets the historical model.

(The following code works for Django 1.8.x)

# Good example for a data migration
from django.db import migrations
from django.core.serializers import base, python
from django.core.management import call_command


def load_fixture(apps, schema_editor):
    # Save the old _get_model() function
    old_get_model = python._get_model

    # Define new _get_model() function here, which utilizes the apps argument to
    # get the historical version of a model. This piece of code is directly stolen
    # from django.core.serializers.python._get_model, unchanged. However, here it
    # has a different context, specifically, the apps variable.
    def _get_model(model_identifier):
        try:
            return apps.get_model(model_identifier)
        except (LookupError, TypeError):
            raise base.DeserializationError("Invalid model identifier: '%s'" % model_identifier)

    # Replace the _get_model() function on the module, so loaddata can utilize it.
    python._get_model = _get_model

    try:
        # Call loaddata command
        call_command('loaddata', 'your_data.json', app_label='yourapp')
    finally:
        # Restore old _get_model() function
        python._get_model = old_get_model


class Migration(migrations.Migration):
    dependencies = [
        # Dependencies to other migrations
    ]

    operations = [
        migrations.RunPython(load_fixture),
    ]

Inspired by some of the comments (namely n__o's) and the fact that I have a lot of initial_data.* files spread out over multiple apps I decided to create a Django app that would facilitate the creation of these data migrations.

Using django-migration-fixture you can simply run the following management command and it will search through all your INSTALLED_APPS for initial_data.* files and turn them into data migrations.

./manage.py create_initial_data_fixtures
Migrations for 'eggs':
  0002_auto_20150107_0817.py:
Migrations for 'sausage':
  Ignoring 'initial_data.yaml' - migration already exists.
Migrations for 'foo':
  Ignoring 'initial_data.yaml' - not migrated.

See django-migration-fixture for install/usage instructions.


In order to give your database some initial data, write a data migration. In the data migration, use the RunPython function to load your data.

Don't write any loaddata command as this way is deprecated.

Your data migrations will be run only once. The migrations are an ordered sequence of migrations. When the 003_xxxx.py migrations is run, django migrations writes in the database that this app is migrated until this one (003), and will run the following migrations only.


The solutions presented above didn't work for me unfortunately. I found that every time I change my models I have to update my fixtures. Ideally I would instead write data migrations to modify created data and fixture-loaded data similarly.

To facilitate this I wrote a quick function which will look in the fixtures directory of the current app and load a fixture. Put this function into a migration in the point of the model history that matches the fields in the migration.


In my opinion fixtures are a bit bad. If your database changes frequently, keeping them up-to-date will came a nightmare soon. Actually, it's not only my opinion, in the book "Two Scoops of Django" it's explained much better.

Instead I'll write a Python file to provide initial setup. If you need something more I'll suggest you look at Factory boy.

If you need to migrate some data you should use data migrations.

There's also "Burn Your Fixtures, Use Model Factories" about using fixtures.


On Django 2.1, I wanted to load some models (Like country names for example) with initial data.

But I wanted this to happen automatically right after the execution of initial migrations.

So I thought that it would be great to have an sql/ folder inside each application that required initial data to be loaded.

Then within that sql/ folder I would have .sql files with the required DMLs to load the initial data into the corresponding models, for example:

INSERT INTO appName_modelName(fieldName)
VALUES
    ("country 1"),
    ("country 2"),
    ("country 3"),
    ("country 4");

To be more descriptive, this is how an app containing an sql/ folder would look: enter image description here

Also I found some cases where I needed the sql scripts to be executed in a specific order. So I decided to prefix the file names with a consecutive number as seen in the image above.

Then I needed a way to load any SQLs available inside any application folder automatically by doing python manage.py migrate.

So I created another application named initial_data_migrations and then I added this app to the list of INSTALLED_APPS in settings.py file. Then I created a migrations folder inside and added a file called run_sql_scripts.py (Which actually is a custom migration). As seen in the image below:

enter image description here

I created run_sql_scripts.py so that it takes care of running all sql scripts available within each application. This one is then fired when someone runs python manage.py migrate. This custom migration also adds the involved applications as dependencies, that way it attempts to run the sql statements only after the required applications have executed their 0001_initial.py migrations (We don't want to attempt running a SQL statement against a non-existent table).

Here is the source of that script:

import os
import itertools

from django.db import migrations
from YourDjangoProjectName.settings import BASE_DIR, INSTALLED_APPS

SQL_FOLDER = "/sql/"

APP_SQL_FOLDERS = [
    (os.path.join(BASE_DIR, app + SQL_FOLDER), app) for app in INSTALLED_APPS
    if os.path.isdir(os.path.join(BASE_DIR, app + SQL_FOLDER))
]

SQL_FILES = [
    sorted([path + file for file in os.listdir(path) if file.lower().endswith('.sql')])
    for path, app in APP_SQL_FOLDERS
]


def load_file(path):
    with open(path, 'r') as f:
        return f.read()


class Migration(migrations.Migration):

    dependencies = [
        (app, '__first__') for path, app in APP_SQL_FOLDERS
    ]

    operations = [
        migrations.RunSQL(load_file(f)) for f in list(itertools.chain.from_iterable(SQL_FILES))
    ]

I hope someone finds this helpful, it worked just fine for me!. If you have any questions please let me know.

NOTE: This might not be the best solution since I'm just getting started with django, however still wanted to share this "How-to" with you all since I didn't find much information while googling about this.

참고URL : https://stackoverflow.com/questions/25960850/loading-initial-data-with-django-1-7-and-data-migrations

반응형