金子邦彦研究室プログラミングOctave の活用Octave で書かれた K-means プログラムの例

Octave で書かれた K-means プログラムの例

Octave で書かれた K-means プログラムの例を示す.K-means はクラスタリングのアルゴリズム. なぜ Octave で書いているかというと,学習用,またいろいろな距離関数 (metric function) を試したいという気持ちがあり, C 言語などにこだわらず(速いことにこだわらず),Octave で書かれたK-means プログラムにも価値があると考えたからです.

※ 試しに動作させましたが,Octave でも,むちゃくちゃは遅くありません. 私の Linux マシンで 512×512 のカラー画像をクラスタ数 10 でクラスタリングしたところ 20秒弱.

※ とはいえ,性能をとても重視する(ソースコードには興味ない)場合には,「k-means クラスタリングの例 (Open-CV を使用) 」の Web ページに,OpenCV を利用した K-means の Octave プログラムを載せていますので,そちらを見てください.

前準備 (Pre-requisite)

Octave のインストール (Install Octave)


The following code is the K-Means algorithm implementationwrittein in Octave
#
# The following code is the K-Means algorithm implementationwrittein in Octave
# Octave で書かれた K-means アルゴリズムの例
#
# The following code is the K-Means algorithm implementationwrittein in Octave
# Octave で書かれた K-means アルゴリズムの例

function [width, height] = mat_size(X)
  height = size(X)(1);
  width = size(X)(2);
endfunction

function Metrics = calc_metrics(Data, p)
# metrics between all data points in 'Data' and 'p'.
  Diff = (Data - repmat( p, size(Data)(1), 1 ))';
  # Euclidian distance
  Metrics = sum(Diff .* Diff);
  clear Diff
endfunction

function num = find_empty_cluster(Data, assignment, n_clusters)

  for i = 1:n_clusters
    # select data points in the i-th cluster,
    Points = Data( find( assignment(:,2) == i ), : );
    if ( size(Points)(1) == 0 )
      num = i;
      clear Points;
      return;
    endif
    clear Points;
  endfor
  num = 0;

endfunction

function [NewCentroidPoints, assignment] = kmeans_step(Data, CentroidPoints, n_clusters)

  # get the number of data points (= height) and the dimesnion (= width)
  [width, height] = mat_size(Data);

  # 'D' is a matrix to store the distance between data points and centroids
  # データ点と centroids 間の距離を行列 D に格納する
  D = zeros(height,n_clusters);
  for i = 1:n_clusters
    D(:,i) = calc_metrics( Data, CentroidPoints(i,:) );
  endfor

  # the shortes distance from each data point to any centroids
  # 各データポイントから centroids までの最短距離
  shortest = min(D')';

  # get the centroid number for each data point
  # 各データポイントについて,最も近い centroid の番号を得る
  [m,n] = find( D == repmat(shortest,1,n_clusters) );
  assignment = zeros(height,2);
  assignment(:,1) = 1:height;    # data point number
  assignment(m,2) = n; # cluster number

  done = 0;
  while ( done == 0 )
    # try to find an empty cluster
    empty_cluster_num = find_empty_cluster(Data, assignment, n_clusters);
    if ( empty_cluster_num > 0 )
      # assign the farest data point to the empty cluster
      data_point_num = find( shortest == max(shortest) );
      assignment(data_point_num, empty_cluster_num);
      # new, the farest data point becomes a centroid
      shorttest(data_point_num) = 0;
    else
      done = 1;
    endif
  endwhile

  clear shortest;
  clear D;

  # new centoid of each cluster
  # 各クラスタの centroid を求める
     for i = 1:n_clusters
    # select data points in the i-th cluster,
    Points = Data( find( assignment(:,2) == i ), : );
    # compute the centroid of i-th cluster
    if ( size(Points)(1) == 1 )
      NewCentroidPoints(i,:) = Points;
    else
      NewCentroidPoints(i,:) = mean(Points);
    endif
    clear Points;
  endfor

endfunction



#'
# Read Color Image Data
[rgb, immap, alpha] = imread("lena_std.jpg");
mono = rgb2gray( rgb );
colormap(gray(256));

# Initialize 3-dimensional data points
# Set color image data into 'Data'
A = double(rgb)/255;
Data = zeros( size(A)(1) * size(A)(2), size(A)(3) );
Data(:,1) = reshape( A(:,:,1), size(A)(1) * size(A)(2), 1 );
Data(:,2) = reshape( A(:,:,2), size(A)(1) * size(A)(2), 1 );
Data(:,3) = reshape( A(:,:,3), size(A)(1) * size(A)(2), 1 );
n_clusters = 10;

# get the number of data points (= height) and the dimesnion (= width)
[width, height] = mat_size(Data);

# clusters not yet constructed.
# select different 'n_clusters' data points number randomly to create initial 'centoroids'
# クラスタはまだできていない
# 'centroids' の初期値を得るために,'n_clusters' 個の異なるデータ点番号をランダムに選ぶ.
CentroidPoints = Data( randperm(height)(1:n_clusters), : );
[NewCentroidPoints, assignment] = kmeans_step(Data, CentroidPoints, n_clusters);

a = 1;
while( a > 0 )
  CentroidPoints = NewCentroidPoints;
  clear NewCentroidPoints;
  old_assignment = assignment(:,2);
  clear assignment;
  [NewCentroidPoints, assignment] = kmeans_step(Data, CentroidPoints, n_clusters);

  # assignment changed ?
  # 'assignment' に変更があったか
  a = max( abs( old_assignment - assignment(:,2) ) );
endwhile

V = NewCentroidPoints( assignment(:,2),: );
NEWIMAGE = zeros( size(A)(1), size(A)(2), size(A)(3) );
NEWIMAGE(:,:,1) = reshape( V(:,1), size(A)(1), size(A)(2) );
NEWIMAGE(:,:,2) = reshape( V(:,2), size(A)(1), size(A)(2) );
NEWIMAGE(:,:,3) = reshape( V(:,3), size(A)(1), size(A)(2) );
clear V;
imwrite(NEWIMAGE, "new.png");
disp "done"

Input Data Example

[image]

Output Data Example

[image]