2015年1月21日 星期三

多個Cassandra Node在不同Data Center

本實作是採用Cassandra 1.0.12來實施

Topology





Token分配
由於Cassandra 1.0.12並沒有token generator, 建議下載https://raw.github.com/riptano/ComboAMI/2.2/tokentoolv2.py, 產生Token分配表, 如下所示:


Node hostname IP Address Token Data Center Rack
node0 clouddb1.gc.net 172.16.70.32 0 DC1 RAC1
node1 clouddb2.gc.net 172.16.70.41 56713727820156410577229101238628035242 DC1 RAC1
node2 clouddb3.gc.net 172.16.70.42 113427455640312821154458202477256070485 DC1 RAC1
node3 clouddb4.gc.net 172.16.70.43 28356863910078205288614550619314017621 DC2 RAC1
node4 clouddb5.gc.net 172.16.70.44 85070591730234615865843651857942052863 DC2 RAC1
node5 clouddb6.gc.net 172.16.70.45 141784319550391026443072753096570088106 DC2 RAC1



修改cassandra.yaml
依token分配表將token和hostname填入initial_token和listen_address,例如
In node 0
initial_token: 0
listen_address: clouddb1.gc.net


In node 1
initial_token: 56713727820156410577229101238628035242
listen_address: clouddb2.gc.net
依此類推


Snitch用來設定Topology環境, 目的避免單一node failure.
其中環境可分為DataCenter和Rack, 在本文的測試環境分為DC1和DC2, 統一都是使用第一組機台

原文如下:
Set this to a class that implements
# IEndpointSnitch.  The snitch has two functions:
# - it teaches Cassandra enough about your network topology to route
#   requests efficiently
# - it allows Cassandra to spread replicas around your cluster to avoid
#   correlated failures. It does this by grouping machines into
#   "datacenters" and "racks."  Cassandra will do its best not to have
#   more than one replica on the same "rack" (which may not actually
#   be a physical location)

Cassandra提供幾種Snitch的方式

  • SimpleSnitch
  •  Treats Strategy order as proximity. This improves cache locality  when disabling read repair, which can further improve throughput.  Only appropriate for single-datacenter deployments.

  • PropertyFileSnitch
  •  Proximity is determined by rack and data center, which are  explicitly configured in cassandra-topology.properties.

  • RackInferringSnitch
  •  Proximity is determined by rack and data center, which are  assumed to correspond to the 3rd and 2nd octet of each node's  IP address, respectively.  Unless this happens to match your  deployment conventions (as it did Facebook's), this is best used  as an example of writing a custom Snitch class.

  • Ec2Snitch
  •  Appropriate for EC2 deployments in a single Region.  Loads Region  and Availability Zone information from the EC2 API. The Region is  treated as the Datacenter, and the Availability Zone as the rack.  Only private IPs are used, so this will not work across multiple  Regions.

  • Ec2MultiRegionSnitch
  •  Uses public IPs as broadcast_address to allow cross-region  connectivity.  (Thus, you should set seed addresses to the public  IP as well.) You will need to open the storage_port or  ssl_storage_port on the public IP firewall.  (For intra-Region  traffic, Cassandra will switch to the private IP after  establishing a connection.)
本文使用的endpoint snitch為PropertyFileSnitch, 其支援Run-time更新已異動的Property值
在node0~node5的conf/cassandra.yaml
endpoint_snitch: PropertyFileSnitch

且修改conf/cassandra-topology.properties, 指定data center和rack
172.16.70.32=DC1:RAC1
172.16.70.41=DC1:RAC1
172.16.70.42=DC1:RAC1

172.16.70.43=DC2:RAC1
172.16.70.44=DC2:RAC1
172.16.70.45=DC2:RAC1

default=DC1:r1



Seed Node
指定node0, node1和node2為seed node
node0~node5的conf/cassandra.yaml
seed_provider:
    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
      parameters:
          - seeds: "clouddb1.gc.ubicloud.net,clouddb4.gc.ubicloud.net"

開起服務
依序開啟node0~node5的Cassandra process
結果如下



================================================
後記

新增一個sample keyspace
$ ./cqlsh localhost
> CREATE KEYSPACE sample WITH strategy_class = 'NetworkTopologyStrategy' AND strategy_options:DC1 = '3' and strategy_options:DC2 = '3';

Ring的結果:
$ ./nodetool -h self ring
Address         DC          Rack        Status State   Load            Owns    Token
                                                                               169417178424467235000914166253263322299
node0  172.16.70.32    DC1         RAC1        Up     Normal  93.18 KB        0.43%   0
node4  172.16.70.44    DC2         RAC1        Up     Normal  74.67 KB        32.91%  55989722784154413846455963776007251813
node1  172.16.70.41    DC1         RAC1        Up     Normal  97.89 KB        0.43%   56713727820156410577229101238628035242
node5  172.16.70.45    DC2         RAC1        Up     Normal  81.01 KB        32.91%  112703450604310824423685065014635287055
node2  172.16.70.42    DC1         RAC1        Up     Normal  97.66 KB        0.43%   113427455640312821154458202477256070484
node3  172.16.70.43    DC2         RAC1        Up     Normal  81.01 KB        32.91%  169417178424467235000914166253263322299

$ ./nodetool -h self describering sample
TokenRange:
  TokenRange(start_token:55989722784154413846455963776007251813, end_token:56713727820156410577229101238628035242, endpoints:[172.16.70.45, 172.16.70.43, 172.16.70.44, 172.16.70.41, 172.16.70.42, 172.16.70.32], rpc_endpoints:[0.0.0.0, 0.0.0.0, 0.0.0.0, 0.0.0.0, 0.0.0.0, 0.0.0.0], endpoint_details:[EndpointDetails(host:172.16.70.45, datacenter:DC2, rack:RAC1), EndpointDetails(host:172.16.70.43, datacenter:DC2, rack:RAC1), EndpointDetails(host:172.16.70.44, datacenter:DC2, rack:RAC1), EndpointDetails(host:172.16.70.41, datacenter:DC1, rack:RAC1), EndpointDetails(host:172.16.70.42, datacenter:DC1, rack:RAC1), EndpointDetails(host:172.16.70.32, datacenter:DC1, rack:RAC1)])
  TokenRange(start_token:113427455640312821154458202477256070484, end_token:169417178424467235000914166253263322299, endpoints:[172.16.70.43, 172.16.70.44, 172.16.70.45, 172.16.70.32, 172.16.70.41, 172.16.70.42], rpc_endpoints:[0.0.0.0, 0.0.0.0, 0.0.0.0, 0.0.0.0, 0.0.0.0, 0.0.0.0], endpoint_details:[EndpointDetails(host:172.16.70.43, datacenter:DC2, rack:RAC1), EndpointDetails(host:172.16.70.44, datacenter:DC2, rack:RAC1), EndpointDetails(host:172.16.70.45, datacenter:DC2, rack:RAC1), EndpointDetails(host:172.16.70.32, datacenter:DC1, rack:RAC1), EndpointDetails(host:172.16.70.41, datacenter:DC1, rack:RAC1), EndpointDetails(host:172.16.70.42, datacenter:DC1, rack:RAC1)])
  TokenRange(start_token:169417178424467235000914166253263322299, end_token:0, endpoints:[172.16.70.44, 172.16.70.45, 172.16.70.43, 172.16.70.32, 172.16.70.41, 172.16.70.42], rpc_endpoints:[0.0.0.0, 0.0.0.0, 0.0.0.0, 0.0.0.0, 0.0.0.0, 0.0.0.0], endpoint_details:[EndpointDetails(host:172.16.70.44, datacenter:DC2, rack:RAC1), EndpointDetails(host:172.16.70.45, datacenter:DC2, rack:RAC1), EndpointDetails(host:172.16.70.43, datacenter:DC2, rack:RAC1), EndpointDetails(host:172.16.70.32, datacenter:DC1, rack:RAC1), EndpointDetails(host:172.16.70.41, datacenter:DC1, rack:RAC1), EndpointDetails(host:172.16.70.42, datacenter:DC1, rack:RAC1)])
  TokenRange(start_token:56713727820156410577229101238628035242, end_token:112703450604310824423685065014635287055, endpoints:[172.16.70.45, 172.16.70.43, 172.16.70.44, 172.16.70.42, 172.16.70.32, 172.16.70.41], rpc_endpoints:[0.0.0.0, 0.0.0.0, 0.0.0.0, 0.0.0.0, 0.0.0.0, 0.0.0.0], endpoint_details:[EndpointDetails(host:172.16.70.45, datacenter:DC2, rack:RAC1), EndpointDetails(host:172.16.70.43, datacenter:DC2, rack:RAC1), EndpointDetails(host:172.16.70.44, datacenter:DC2, rack:RAC1), EndpointDetails(host:172.16.70.42, datacenter:DC1, rack:RAC1), EndpointDetails(host:172.16.70.32, datacenter:DC1, rack:RAC1), EndpointDetails(host:172.16.70.41, datacenter:DC1, rack:RAC1)])
  TokenRange(start_token:112703450604310824423685065014635287055, end_token:113427455640312821154458202477256070484, endpoints:[172.16.70.43, 172.16.70.44, 172.16.70.45, 172.16.70.42, 172.16.70.32, 172.16.70.41], rpc_endpoints:[0.0.0.0, 0.0.0.0, 0.0.0.0, 0.0.0.0, 0.0.0.0, 0.0.0.0], endpoint_details:[EndpointDetails(host:172.16.70.43, datacenter:DC2, rack:RAC1), EndpointDetails(host:172.16.70.44, datacenter:DC2, rack:RAC1), EndpointDetails(host:172.16.70.45, datacenter:DC2, rack:RAC1), EndpointDetails(host:172.16.70.42, datacenter:DC1, rack:RAC1), EndpointDetails(host:172.16.70.32, datacenter:DC1, rack:RAC1), EndpointDetails(host:172.16.70.41, datacenter:DC1, rack:RAC1)])
  TokenRange(start_token:0, end_token:55989722784154413846455963776007251813, endpoints:[172.16.70.44, 172.16.70.45, 172.16.70.43, 172.16.70.41, 172.16.70.42, 172.16.70.32], rpc_endpoints:[0.0.0.0, 0.0.0.0, 0.0.0.0, 0.0.0.0, 0.0.0.0, 0.0.0.0], endpoint_details:[EndpointDetails(host:172.16.70.44, datacenter:DC2, rack:RAC1), EndpointDetails(host:172.16.70.45, datacenter:DC2, rack:RAC1), EndpointDetails(host:172.16.70.43, datacenter:DC2, rack:RAC1), EndpointDetails(host:172.16.70.41, datacenter:DC1, rack:RAC1), EndpointDetails(host:172.16.70.42, datacenter:DC1, rack:RAC1), EndpointDetails(host:172.16.70.32, datacenter:DC1, rack:RAC1)])

從describering的結果顯示Ring的排列方式如下:
4 -> 1 -> 5 -> 2 -> 3 -> 0 -> 4

==================================================
若改用Cassandra 2.1.x, 結果如下

$ ./nodetool -h self status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 172.16.70.32 107.14 KB 256 ? b5d8b0c5-5c4c-43ad-b456-e3a0b2dbf348 RAC1
UN 172.16.70.41 141.17 KB 256 ? 51466f2f-a986-4843-9e36-6fca697301ac RAC1
UN 172.16.70.42 141.99 KB 256 ? f7faaba2-f5dd-46a0-b272-5fed57bf1123 RAC1
Datacenter: DC2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 172.16.70.43 111.84 KB 256 ? 810b84bf-25fd-4787-b406-9973339ef77f RAC1
UN 172.16.70.44 126.95 KB 256 ? 01db41b4-b000-46b0-99f4-063e2ddda4dd RAC1
UN 172.16.70.45 141.58 KB 256 ? 910ed07e-8484-434d-bc66-3e685e4311c4 RAC1