Programing

Keras binary_crossentropy 대 categorical_crossentropy 성능?

crosscheck 2020. 7. 15. 07:40
반응형

Keras binary_crossentropy 대 categorical_crossentropy 성능?


주제별로 텍스트를 분류하도록 CNN을 훈련하려고합니다. 이진 교차 엔트로피를 사용하면 ~ 80 % 정확도를 얻을 수 있으며 범주 형 교차 엔트로피를 사용하면 ~ 50 % 정확도를 얻을 수 있습니다.

왜 그런지 이해할 수 없습니다. 그것은 다중 클래스 문제입니다. 즉, 범주 간 엔트로피를 사용해야하며 이진 상호 엔트로피 결과가 의미가 없다는 것을 의미하지 않습니까?

model.add(embedding_layer)
model.add(Dropout(0.25))
# convolution layers
model.add(Conv1D(nb_filter=32,
                    filter_length=4,
                    border_mode='valid',
                    activation='relu'))
model.add(MaxPooling1D(pool_length=2))
# dense layers
model.add(Flatten())
model.add(Dense(256))
model.add(Dropout(0.25))
model.add(Activation('relu'))
# output layer
model.add(Dense(len(class_id_index)))
model.add(Activation('softmax'))

그런 다음 categorical_crossentropy손실 함수로 사용하여 다음과 같이 컴파일합니다 .

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

또는

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

범주 형 및 이진 교차 엔트로피 간의 이러한 명백한 성능 불일치의 이유는 @ xtof54가 이미 그의 답변에서보고 한 것입니다.

evaluate2 개 이상의 레이블이있는 binary_crossentropy를 사용할 때 Keras 방법으로 계산 된 정확도 가 잘못되었습니다.

이에 대해 더 자세히 설명하고 실제 근본적인 문제를 설명하고 설명하며 해결책을 제시하고 싶습니다.

이 동작은 버그가 아닙니다. 근본적인 이유는 단순히 모델 컴파일에 포함시킬 때 선택한 손실 함수에 따라 Keras가 실제로 어느 정확도를 사용 할지 추측 하는 방법에있어 미묘하고 문서화되지 않은 문제 metrics=['accuracy']입니다. 즉, 첫 번째 컴파일 옵션 동안

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

두 번째 것은 유효합니다.

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

당신이 기대하는 것을 생산하지는 않지만 그 이유는 이진 크로스 엔트로피 (적어도 원칙적으로는 절대적으로 유효한 손실 함수)를 사용하지 않기 때문입니다.

왜 그런 겁니까? 당신이 선택하면 측정 소스 코드를 , Keras는 하나의 정확도 측정하지만, 그 중 몇 가지 것들, 정의하지 않습니다 binary_accuracycategorical_accuracy. 후드 아래에서 발생 하는 것은 손실 함수로 이진 교차 엔트로피를 선택하고 특정 정확도 메트릭을 지정하지 않았기 때문에 Keras (잘못 ...)는에 관심이 있다고 추측하고 binary_accuracy이것이 반환되는 것입니다. 실제로 당신은에 관심이 categorical_accuracy있습니다.

KerasMNIST CNN 예제 를 사용하여 다음과 같이 수정 한 경우인지 확인하십시오.

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])  # WRONG way

model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=2,  # only 2 epochs, for demonstration purposes
          verbose=1,
          validation_data=(x_test, y_test))

# Keras reported accuracy:
score = model.evaluate(x_test, y_test, verbose=0) 
score[1]
# 0.9975801164627075

# Actual accuracy calculated manually:
import numpy as np
y_pred = model.predict(x_test)
acc = sum([np.argmax(y_test[i])==np.argmax(y_pred[i]) for i in range(10000)])/10000
acc
# 0.98780000000000001

score[1]==acc
# False    

To remedy this, i.e. to use indeed binary cross entropy as your loss function (as I said, nothing wrong with this, at least in principle) while still getting the categorical accuracy required by the problem at hand, you should ask explicitly for categorical_accuracy in the model compilation as follows:

from keras.metrics import categorical_accuracy
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=[categorical_accuracy])

In the MNIST example, after training, scoring, and predicting the test set as I show above, the two metrics now are the same, as they should be:

# Keras reported accuracy:
score = model.evaluate(x_test, y_test, verbose=0) 
score[1]
# 0.98580000000000001

# Actual accuracy calculated manually:
y_pred = model.predict(x_test)
acc = sum([np.argmax(y_test[i])==np.argmax(y_pred[i]) for i in range(10000)])/10000
acc
# 0.98580000000000001

score[1]==acc
# True    

System setup:

Python version 3.5.3
Tensorflow version 1.2.1
Keras version 2.0.4

UPDATE: After my post, I discovered that this issue had already been identified in this answer.


It all depends on the type of classification problem you are dealing with. There are three main categories;

  • binary classification (two target classes)
  • multi-class classification (more than two exclusive targets)
  • multi-label classification (more than two non exclusive targets) in which multiple target classes can be on at the same time

In the first case, binary cross-entropy should be used and targets should be encoded as one-hot vectors.

In the second case, categorical cross-entropy should be used and targets should be encoded as one-hot vectors.

In the last case, binary cross-entropy should be used and targets should be encoded as one-hot vectors. Each output neuron (or unit) is considered as a separate random binary variable, and the loss for the entire vector of outputs is the product of the loss of single binary variables. Therefore it is the product of binary cross-entropy for each single output unit.

binary cross-entropy is defined as such: binary cross-entropy and categorical cross-entropy is defined as such: categorical cross-entropy


I came across an "inverted" issue — I was getting good results with categorical_crossentropy (with 2 classes) and poor with binary_crossentropy. It seems that problem was with wrong activation function. The correct settings were:

  • for binary_crossentropy: sigmoid activation, scalar target
  • for categorical_crossentropy: softmax activation, one-hot encoded target

It's really interesting case. Actually in your setup the following statement is true:

binary_crossentropy = len(class_id_index) * categorical_crossentropy

This means that up to a constant multiplication factor your losses are equivalent. The weird behaviour that you are observing during a training phase might be an example of a following phenomenon:

  1. At the beginning the most frequent class is dominating the loss - so network is learning to predict mostly this class for every example.
  2. After it learnt the most frequent pattern it starts discriminating among less frequent classes. But when you are using adam - the learning rate has a much smaller value than it had at the beginning of training (it's because of the nature of this optimizer). It makes training slower and prevents your network from e.g. leaving a poor local minimum less possible.

That's why this constant factor might help in case of binary_crossentropy. After many epochs - the learning rate value is greater than in categorical_crossentropy case. I usually restart training (and learning phase) a few times when I notice such behaviour or/and adjusting a class weights using the following pattern:

class_weight = 1 / class_frequency

This makes loss from a less frequent classes balancing the influence of a dominant class loss at the beginning of a training and in a further part of an optimization process.

EDIT:

Actually - I checked that even though in case of maths:

binary_crossentropy = len(class_id_index) * categorical_crossentropy

should hold - in case of keras it's not true, because keras is automatically normalizing all outputs to sum up to 1. This is the actual reason behind this weird behaviour as in case of multiclassification such normalization harms a training.


After commenting @Marcin answer, I have more carefully checked one of my students code where I found the same weird behavior, even after only 2 epochs ! (So @Marcin's explanation was not very likely in my case).

And I found that the answer is actually very simple: the accuracy computed with the Keras method evaluate is just plain wrong when using binary_crossentropy with more than 2 labels. You can check that by recomputing the accuracy yourself (first call the Keras method "predict" and then compute the number of correct answers returned by predict): you get the true accuracy, which is much lower than the Keras "evaluate" one.


a simple example under a multi-class setting to illustrate

suppose you have 4 classes (onehot encoded) and below is just one prediction

true_label = [0,1,0,0] predicted_label = [0,0,1,0]

when using categorical_crossentropy, the accuracy is just 0 , it only cares about if you get the concerned class right.

however when using binary_crossentropy, the accuracy is calculated for all classes, it would be 50% for this prediction. and the final result will be the mean of the individual accuracies for both cases.

it is recommended to use categorical_crossentropy for multi-class(classes are mutually exclusive) problem but binary_crossentropy for multi-label problem.


As it is a multi-class problem, you have to use the categorical_crossentropy, the binary cross entropy will produce bogus results, most likely will only evaluate the first two classes only.

50% for a multi-class problem can be quite good, depending on the number of classes. If you have n classes, then 100/n is the minimum performance you can get by outputting a random class.


Take a look at the equation you can find that binary cross entropy not only punish those label = 1, predicted =0, but also label = 0, predicted = 1.

However categorical cross entropy only punish those label = 1 but predicted = 1.That's why we make assumption that there is only ONE label positive.


when using the categorical_crossentropy loss, your targets should be in categorical format (e.g. if you have 10 classes, the target for each sample should be a 10-dimensional vector that is all-zeros except for a 1 at the index corresponding to the class of the sample).


You are passing a target array of shape (x-dim, y-dim) while using as loss categorical_crossentropy. categorical_crossentropy expects targets to be binary matrices (1s and 0s) of shape (samples, classes). If your targets are integer classes, you can convert them to the expected format via:

from keras.utils import to_categorical
y_binary = to_categorical(y_int)

Alternatively, you can use the loss function sparse_categorical_crossentropy instead, which does expect integer targets.

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

The binary_crossentropy(y_target, y_predict) doesn't need to apply in binary classification problem. .

In the source code of binary_crossentropy(), the nn.sigmoid_cross_entropy_with_logits(labels=target, logits=output) TensorFlow function was actually used. And, in the documentation, it says that:

Measures the probability error in discrete classification tasks in which each class is independent and not mutually exclusive. For instance, one could perform multilabel classification where a picture can contain both an elephant and a dog at the same time.

참고URL : https://stackoverflow.com/questions/42081257/keras-binary-crossentropy-vs-categorical-crossentropy-performance

반응형