Commit 57a63f85 authored by polandll's avatar polandll Committed by Jon Haddad jon@jonhaddad.com

Improving Cassandra configuration docs

* Update copyright date to 2020
* Add glossary
* Rearranged rackdc for better flow
* Add topologies properties file
* Add commitlog_archiving file
* Add logback.xml info
* jvm.options files
* Removed invalid reference to wiki from configuration file

Patch by Lorina Poland; Reviewed by Jon Haddad for CASSANDRA-15822
parent ecf2c9cc
# Cassandra storage config YAML
# NOTE:
# See http://wiki.apache.org/cassandra/StorageConfiguration for
# See https://cassandra.apache.org/doc/latest/configuration/ for
# full explanations of configuration directives
# /NOTE
......@@ -20,8 +20,6 @@ cluster_name: 'Test Cluster'
# Specifying initial_token will override this setting on the node's initial start,
# on subsequent starts, this setting will apply even if initial token is set.
#
# If you already have a cluster with 1 token per node, and wish to migrate to
# multiple tokens per node, see http://wiki.apache.org/cassandra/Operations
num_tokens: 256
# Triggers automatic allocation of num_tokens tokens for this node. The allocation
......@@ -47,7 +45,6 @@ num_tokens: 256
# that do not have vnodes enabled.
# initial_token:
# See http://wiki.apache.org/cassandra/HintedHandoff
# May either be "true" or "false" to enable globally
hinted_handoff_enabled: true
......
......@@ -13,7 +13,7 @@ RUN apt-get update && apt-get install -y software-properties-common
RUN wget -qO - https://adoptopenjdk.jfrog.io/adoptopenjdk/api/gpg/key/public | apt-key add - \
&& add-apt-repository --yes https://adoptopenjdk.jfrog.io/adoptopenjdk/deb/ \
&& apt-get update \
&& apt-get install -y adoptopenjdk-8-hotspot ant
&& apt-get install -y adoptopenjdk-11-hotspot ant
RUN apt-get clean
......
......@@ -75,7 +75,7 @@ master_doc = 'index'
# General information about the project.
project = u'Apache Cassandra'
copyright = u'2016, The Apache Cassandra team'
copyright = u'2020, The Apache Cassandra team'
author = u'The Apache Cassandra team'
# The version info for the project you're documenting, acts as replacement for
......
.. _cassandra-cl-archive:
commitlog-archiving.properties file
================================
The ``commitlog-archiving.properties`` configuration file can optionally set commands that are executed when archiving or restoring a commitlog segment.
===========================
Options
===========================
``archive_command=<command>``
------
One command can be inserted with %path and %name arguments. %path is the fully qualified path of the commitlog segment to archive. %name is the filename of the commitlog. STDOUT, STDIN, or multiple commands cannot be executed. If multiple commands are required, add a pointer to a script in this option.
**Example:** archive_command=/bin/ln %path /backup/%name
**Default value:** blank
``restore_command=<command>``
------
One command can be inserted with %from and %to arguments. %from is the fully qualified path to an archived commitlog segment using the specified restore directories. %to defines the directory to the live commitlog location.
**Example:** restore_command=/bin/cp -f %from %to
**Default value:** blank
``restore_directories=<directory>``
------
Defines the directory to scan the recovery files into.
**Default value:** blank
``restore_point_in_time=<timestamp>``
------
Restore mutations created up to and including this timestamp in GMT in the format ``yyyy:MM:dd HH:mm:ss``. Recovery will continue through the segment when the first client-supplied timestamp greater than this time is encountered, but only mutations less than or equal to this timestamp will be applied.
**Example:** 2020:04:31 20:43:12
**Default value:** blank
``precision=<timestamp_precision>``
------
Precision of the timestamp used in the inserts. Choice is generally MILLISECONDS or MICROSECONDS
**Default value:** MICROSECONDS
.. _cassandra-envsh:
cassandra-env.sh file
=====================
The ``cassandra-env.sh`` bash script file can be used to pass additional options to the Java virtual machine (JVM), such as maximum and minimum heap size, rather than setting them in the environment. If the JVM settings are static and do not need to be computed from the node characteristics, the :ref:`cassandra-jvm-options` files should be used instead. For example, commonly computed values are the heap sizes, using the system values.
For example, add ``JVM_OPTS="$JVM_OPTS -Dcassandra.load_ring_state=false"`` to the ``cassandra_env.sh`` file
and run the command-line ``cassandra`` to start. The option is set from the ``cassandra-env.sh`` file, and is equivalent to starting Cassandra with the command-line option ``cassandra -Dcassandra.load_ring_state=false``.
The ``-D`` option specifies the start-up parameters in both the command line and ``cassandra-env.sh`` file. The following options are available:
``cassandra.auto_bootstrap=false``
----------------------------------
Facilitates setting auto_bootstrap to false on initial set-up of the cluster. The next time you start the cluster, you do not need to change the ``cassandra.yaml`` file on each node to revert to true, the default value.
``cassandra.available_processors=<number_of_processors>``
---------------------------------------------------------
In a multi-instance deployment, multiple Cassandra instances will independently assume that all CPU processors are available to it. This setting allows you to specify a smaller set of processors.
``cassandra.boot_without_jna=true``
-----------------------------------
If JNA fails to initialize, Cassandra fails to boot. Use this command to boot Cassandra without JNA.
``cassandra.config=<directory>``
--------------------------------
The directory location of the ``cassandra.yaml file``. The default location depends on the type of installation.
``cassandra.ignore_dynamic_snitch_severity=true|false``
-------------------------------------------------------
Setting this property to true causes the dynamic snitch to ignore the severity indicator from gossip when scoring nodes. Explore failure detection and recovery and dynamic snitching for more information.
**Default:** false
``cassandra.initial_token=<token>``
-----------------------------------
Use when virtual nodes (vnodes) are not used. Sets the initial partitioner token for a node the first time the node is started.
Note: Vnodes are highly recommended as they automatically select tokens.
**Default:** disabled
``cassandra.join_ring=true|false``
----------------------------------
Set to false to start Cassandra on a node but not have the node join the cluster.
You can use ``nodetool join`` and a JMX call to join the ring afterwards.
**Default:** true
``cassandra.load_ring_state=true|false``
----------------------------------------
Set to false to clear all gossip state for the node on restart.
**Default:** true
``cassandra.metricsReporterConfigFile=<filename>``
--------------------------------------------------
Enable pluggable metrics reporter. Explore pluggable metrics reporting for more information.
``cassandra.partitioner=<partitioner>``
---------------------------------------
Set the partitioner.
**Default:** org.apache.cassandra.dht.Murmur3Partitioner
``cassandra.prepared_statements_cache_size_in_bytes=<cache_size>``
------------------------------------------------------------------
Set the cache size for prepared statements.
``cassandra.replace_address=<listen_address of dead node>|<broadcast_address of dead node>``
--------------------------------------------------------------------------------------------
To replace a node that has died, restart a new node in its place specifying the ``listen_address`` or ``broadcast_address`` that the new node is assuming. The new node must not have any data in its data directory, the same state as before bootstrapping.
Note: The ``broadcast_address`` defaults to the ``listen_address`` except when using the ``Ec2MultiRegionSnitch``.
``cassandra.replayList=<table>``
--------------------------------
Allow restoring specific tables from an archived commit log.
``cassandra.ring_delay_ms=<number_of_ms>``
------------------------------------------
Defines the amount of time a node waits to hear from other nodes before formally joining the ring.
**Default:** 1000ms
``cassandra.native_transport_port=<port>``
------------------------------------------
Set the port on which the CQL native transport listens for clients.
**Default:** 9042
``cassandra.rpc_port=<port>``
-----------------------------
Set the port for the Thrift RPC service, which is used for client connections.
**Default:** 9160
``cassandra.storage_port=<port>``
---------------------------------
Set the port for inter-node communication.
**Default:** 7000
``cassandra.ssl_storage_port=<port>``
-------------------------------------
Set the SSL port for encrypted communication.
**Default:** 7001
``cassandra.start_native_transport=true|false``
-----------------------------------------------
Enable or disable the native transport server. See ``start_native_transport`` in ``cassandra.yaml``.
**Default:** true
``cassandra.start_rpc=true|false``
----------------------------------
Enable or disable the Thrift RPC server.
**Default:** true
``cassandra.triggers_dir=<directory>``
--------------------------------------
Set the default location for the trigger JARs.
**Default:** conf/triggers
``cassandra.write_survey=true``
-------------------------------
For testing new compaction and compression strategies. It allows you to experiment with different strategies and benchmark write performance differences without affecting the production workload.
``consistent.rangemovement=true|false``
---------------------------------------
Set to true makes Cassandra perform bootstrap safely without violating consistency. False disables this.
.. _cassandra-jvm-options:
jvm-* files
===========
Several files for JVM configuration are included in Cassandra. The ``jvm-server.options`` file, and corresponding files ``jvm8-server.options`` and ``jvm11-server.options`` are the main file for settings that affect the operation of the Cassandra JVM on cluster nodes. The file includes startup parameters, general JVM settings such as garbage collection, and heap settings. The ``jvm-clients.options`` and corresponding ``jvm8-clients.options`` and ``jvm11-clients.options`` files can be used to configure JVM settings for clients like ``nodetool`` and the ``sstable`` tools.
See each file for examples of settings.
.. note:: The ``jvm-*`` files replace the :ref:`cassandra-envsh` file used in Cassandra versions prior to Cassandra 3.0. The ``cassandra-env.sh`` bash script file is still useful if JVM settings must be dynamically calculated based on system settings. The ``jvm-*`` files only store static JVM settings.
.. _cassandra-logback-xml:
logback.xml file
================================
The ``logback.xml`` configuration file can optionally set logging levels for the logs written to ``system.log`` and ``debug.log``. The logging levels can also be set using ``nodetool setlogginglevels``.
===========================
Options
===========================
``appender name="<appender_choice>"...</appender>``
------
Specify log type and settings. Possible appender names are: ``SYSTEMLOG``, ``DEBUGLOG``, ``ASYNCDEBUGLOG``, and ``STDOUT``. ``SYSTEMLOG`` ensures that WARN and ERROR message are written synchronously to the specified file. ``DEBUGLOG`` and ``ASYNCDEBUGLOG`` ensure that DEBUG messages are written either synchronously or asynchronously, respectively, to the specified file. ``STDOUT`` writes all messages to the console in a human-readable format.
**Example:** <appender name="SYSTEMLOG" class="ch.qos.logback.core.rolling.RollingFileAppender">
``<file> <filename> </file>``
------
Specify the filename for a log.
**Example:** <file>${cassandra.logdir}/system.log</file>
``<level> <log_level> </level>``
------
Specify the level for a log. Part of the filter. Levels are: ``ALL``, ``TRACE``, ``DEBUG``, ``INFO``, ``WARN``, ``ERROR``, ``OFF``. ``TRACE`` creates the most verbose log, ``ERROR`` the least.
.. note::
Note: Increasing logging levels can generate heavy logging output on a moderately trafficked cluster.
You can use the ``nodetool getlogginglevels`` command to see the current logging configuration.
**Default:** INFO
**Example:** <level>INFO</level>
``<rollingPolicy class="<rolling_policy_choice>" <fileNamePattern><pattern_info></fileNamePattern> ... </rollingPolicy>``
------
Specify the policy for rolling logs over to an archive.
**Example:** <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
``<fileNamePattern> <pattern_info> </fileNamePattern>``
------
Specify the pattern information for rolling over the log to archive. Part of the rolling policy.
**Example:** <fileNamePattern>${cassandra.logdir}/system.log.%d{yyyy-MM-dd}.%i.zip</fileNamePattern>
``<maxFileSize> <size> </maxFileSize>``
------
Specify the maximum file size to trigger rolling a log. Part of the rolling policy.
**Example:** <maxFileSize>50MB</maxFileSize>
``<maxHistory> <number_of_days> </maxHistory>``
------
Specify the maximum history in days to trigger rolling a log. Part of the rolling policy.
**Example:** <maxHistory>7</maxHistory>
``<encoder> <pattern>...</pattern> </encoder>``
------
Specify the format of the message. Part of the rolling policy.
**Example:** <maxHistory>7</maxHistory>
**Example:** <encoder> <pattern>%-5level [%thread] %date{ISO8601} %F:%L - %msg%n</pattern> </encoder>
Contents of default ``logback.xml``
-----------------------
.. code-block:: XML
<configuration scan="true" scanPeriod="60 seconds">
<jmxConfigurator />
<!-- No shutdown hook; we run it ourselves in StorageService after shutdown -->
<!-- SYSTEMLOG rolling file appender to system.log (INFO level) -->
<appender name="SYSTEMLOG" class="ch.qos.logback.core.rolling.RollingFileAppender">
<filter class="ch.qos.logback.classic.filter.ThresholdFilter">
<level>INFO</level>
</filter>
<file>${cassandra.logdir}/system.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
<!-- rollover daily -->
<fileNamePattern>${cassandra.logdir}/system.log.%d{yyyy-MM-dd}.%i.zip</fileNamePattern>
<!-- each file should be at most 50MB, keep 7 days worth of history, but at most 5GB -->
<maxFileSize>50MB</maxFileSize>
<maxHistory>7</maxHistory>
<totalSizeCap>5GB</totalSizeCap>
</rollingPolicy>
<encoder>
<pattern>%-5level [%thread] %date{ISO8601} %F:%L - %msg%n</pattern>
</encoder>
</appender>
<!-- DEBUGLOG rolling file appender to debug.log (all levels) -->
<appender name="DEBUGLOG" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>${cassandra.logdir}/debug.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
<!-- rollover daily -->
<fileNamePattern>${cassandra.logdir}/debug.log.%d{yyyy-MM-dd}.%i.zip</fileNamePattern>
<!-- each file should be at most 50MB, keep 7 days worth of history, but at most 5GB -->
<maxFileSize>50MB</maxFileSize>
<maxHistory>7</maxHistory>
<totalSizeCap>5GB</totalSizeCap>
</rollingPolicy>
<encoder>
<pattern>%-5level [%thread] %date{ISO8601} %F:%L - %msg%n</pattern>
</encoder>
</appender>
<!-- ASYNCLOG assynchronous appender to debug.log (all levels) -->
<appender name="ASYNCDEBUGLOG" class="ch.qos.logback.classic.AsyncAppender">
<queueSize>1024</queueSize>
<discardingThreshold>0</discardingThreshold>
<includeCallerData>true</includeCallerData>
<appender-ref ref="DEBUGLOG" />
</appender>
<!-- STDOUT console appender to stdout (INFO level) -->
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<filter class="ch.qos.logback.classic.filter.ThresholdFilter">
<level>INFO</level>
</filter>
<encoder>
<pattern>%-5level [%thread] %date{ISO8601} %F:%L - %msg%n</pattern>
</encoder>
</appender>
<!-- Uncomment bellow and corresponding appender-ref to activate logback metrics
<appender name="LogbackMetrics" class="com.codahale.metrics.logback.InstrumentedAppender" />
-->
<root level="INFO">
<appender-ref ref="SYSTEMLOG" />
<appender-ref ref="STDOUT" />
<appender-ref ref="ASYNCDEBUGLOG" /> <!-- Comment this line to disable debug.log -->
<!--
<appender-ref ref="LogbackMetrics" />
-->
</root>
<logger name="org.apache.cassandra" level="DEBUG"/>
<logger name="com.thinkaurelius.thrift" level="ERROR"/>
</configuration>
.. _cassandra-rackdc:
cassandra-rackdc.properties file
================================
Several :term:`snitch` options use the ``cassandra-rackdc.properties`` configuration file to determine which :term:`datacenters` and racks cluster nodes belong to. Information about the
network topology allows requests to be routed efficiently and to distribute replicas evenly. The following snitches can be configured here:
- GossipingPropertyFileSnitch
- AWS EC2 single-region snitch
- AWS EC2 multi-region snitch
The GossipingPropertyFileSnitch is recommended for production. This snitch uses the datacenter and rack information configured in a local node's ``cassandra-rackdc.properties``
file and propagates the information to other nodes using :term:`gossip`. It is the default snitch and the settings in this properties file are enabled.
The AWS EC2 snitches are configured for clusters in AWS. This snitch uses the ``cassandra-rackdc.properties`` options to designate one of two AWS EC2 datacenter and rack naming conventions:
- legacy: Datacenter name is the part of the availability zone name preceding the last "-" when the zone ends in -1 and includes the number if not -1. Rack name is the portion of the availability zone name following the last "-".
Examples: us-west-1a => dc: us-west, rack: 1a; us-west-2b => dc: us-west-2, rack: 2b;
- standard: Datacenter name is the standard AWS region name, including the number. Rack name is the region plus the availability zone letter.
Examples: us-west-1a => dc: us-west-1, rack: us-west-1a; us-west-2b => dc: us-west-2, rack: us-west-2b;
Either snitch can set to use the local or internal IP address when multiple datacenters are not communicating.
===========================
GossipingPropertyFileSnitch
===========================
``dc``
------
Name of the datacenter. The value is case-sensitive.
**Default value:** DC1
``rack``
--------
Rack designation. The value is case-sensitive.
**Default value:** RAC1
===========================
AWS EC2 snitch
===========================
``ec2_naming_scheme``
---------------------
Datacenter and rack naming convention. Options are ``legacy`` or ``standard`` (default). **This option is commented out by default.**
**Default value:** standard
.. NOTE::
YOU MUST USE THE ``legacy`` VALUE IF YOU ARE UPGRADING A PRE-4.0 CLUSTER.
===========================
Either snitch
===========================
``prefer_local``
----------------
Option to use the local or internal IP address when communication is not across different datacenters. **This option is commented out by default.**
**Default value:** true
.. _cassandra-topology:
cassandra-topologies.properties file
================================
The ``PropertyFileSnitch`` :term:`snitch` option uses the ``cassandra-topologies.properties`` configuration file to determine which :term:`datacenters` and racks cluster nodes belong to. If other snitches are used, the
:ref:cassandra_rackdc must be used. The snitch determines network topology (proximity by rack and datacenter) so that requests are routed efficiently and allows the database to distribute replicas evenly.
Include every node in the cluster in the properties file, defining your datacenter names as in the keyspace definition. The datacenter and rack names are case-sensitive.
The ``cassandra-topologies.properties`` file must be copied identically to every node in the cluster.
===========================
Example
===========================
This example uses three datacenters:
.. code-block:: bash
# datacenter One
175.56.12.105=DC1:RAC1
175.50.13.200=DC1:RAC1
175.54.35.197=DC1:RAC1
120.53.24.101=DC1:RAC2
120.55.16.200=DC1:RAC2
120.57.102.103=DC1:RAC2
# datacenter Two
110.56.12.120=DC2:RAC1
110.50.13.201=DC2:RAC1
110.54.35.184=DC2:RAC1
50.33.23.120=DC2:RAC2
50.45.14.220=DC2:RAC2
50.17.10.203=DC2:RAC2
# datacenter Three
172.106.12.120=DC3:RAC1
172.106.12.121=DC3:RAC1
172.106.12.122=DC3:RAC1
# default for unknown nodes
default =DC3:RAC1
This diff is collapsed.
......@@ -22,4 +22,10 @@ This section describes how to configure Apache Cassandra.
.. toctree::
:maxdepth: 1
cassandra_config_file
cass_yaml_file
cass_rackdc_file
cass_env_sh_file
cass_topo_file
cass_cl_archive_file
cass_logback_xml_file
cass_jvm_options_file
......@@ -17,38 +17,48 @@
Configuring Cassandra
---------------------
For running Cassandra on a single node, the default configuration file present at ``./conf/cassandra.yaml`` is enough,
you shouldn't need to change any configuration. However, when you deploy a cluster of nodes, or use clients that
are not on the same host, then there are some parameters that must be changed.
The :term:`Cassandra` configuration files location varies, depending on the type of installation:
The Cassandra configuration files can be found in the ``conf`` directory of tarballs. For packages, the configuration
files will be located in ``/etc/cassandra``.
- tarball: ``conf`` directory within the tarball install location
- package: ``/etc/cassandra`` directory
Cassandra's default configuration file, ``cassandra.yaml``, is sufficient to explore a simple single-node :term:`cluster`.
However, anything beyond running a single-node cluster locally requires additional configuration to various Cassandra configuration files.
Some examples that require non-default configuration are deploying a multi-node cluster or using clients that are not running on a cluster node.
- ``cassandra.yaml``: the main configuration file for Cassandra
- ``cassandra-env.sh``: environment variables can be set
- ``cassandra-rackdc.properties`` OR ``cassandra-topology.properties``: set rack and datacenter information for a cluster
- ``logback.xml``: logging configuration including logging levels
- ``jvm-*``: a number of JVM configuration files for both the server and clients
- ``commitlog_archiving.properties``: set archiving parameters for the :term:`commitlog`
Two sample configuration files can also be found in ``./conf``:
- ``metrics-reporter-config-sample.yaml``: configuring what the metrics-report will collect
- ``cqlshrc.sample``: how the CQL shell, cqlsh, can be configured
Main runtime properties
^^^^^^^^^^^^^^^^^^^^^^^
Most of configuration in Cassandra is done via yaml properties that can be set in ``cassandra.yaml``. At a minimum you
Configuring Cassandra is done by setting yaml properties in the ``cassandra.yaml`` file. At a minimum you
should consider setting the following properties:
- ``cluster_name``: the name of your cluster.
- ``seeds``: a comma separated list of the IP addresses of your cluster seeds.
- ``storage_port``: you don't necessarily need to change this but make sure that there are no firewalls blocking this
port.
- ``listen_address``: the IP address of your node, this is what allows other nodes to communicate with this node so it
is important that you change it. Alternatively, you can set ``listen_interface`` to tell Cassandra which interface to
use, and consecutively which address to use. Set only one, not both.
- ``native_transport_port``: as for storage\_port, make sure this port is not blocked by firewalls as clients will
communicate with Cassandra on this port.
- ``cluster_name``: Set the name of your cluster.
- ``seeds``: A comma separated list of the IP addresses of your cluster :term:`seed nodes`.
- ``storage_port``: Check that you don't have the default port of 7000 blocked by a firewall.
- ``listen_address``: The :term:`listen address` is the IP address of a node that allows it to communicate with other nodes in the cluster. Set to `localhost` by default. Alternatively, you can set ``listen_interface`` to tell Cassandra which interface to use, and consecutively which address to use. Set one property, not both.
- ``native_transport_port``: Check that you don't have the default port of 9042 blocked by a firewall, so that clients like cqlsh can communicate with Cassandra on this port.
Changing the location of directories
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The following yaml properties control the location of directories:
- ``data_file_directories``: one or more directories where data files are located.
- ``commitlog_directory``: the directory where commitlog files are located.
- ``saved_caches_directory``: the directory where saved caches are located.
- ``hints_directory``: the directory where hints are located.
- ``data_file_directories``: One or more directories where data files, like :term:`SSTables` are located.
- ``commitlog_directory``: The directory where commitlog files are located.
- ``saved_caches_directory``: The directory where saved caches are located.
- ``hints_directory``: The directory where :term:`hints` are located.
For performance reasons, if you have multiple disks, consider putting commitlog and data files on different disks.
......@@ -56,12 +66,15 @@ Environment variables
^^^^^^^^^^^^^^^^^^^^^
JVM-level settings such as heap size can be set in ``cassandra-env.sh``. You can add any additional JVM command line
argument to the ``JVM_OPTS`` environment variable; when Cassandra starts these arguments will be passed to the JVM.
argument to the ``JVM_OPTS`` environment variable; when Cassandra starts, these arguments will be passed to the JVM.
Logging
^^^^^^^
The logger in use is logback. You can change logging properties by editing ``logback.xml``. By default it will log at
INFO level into a file called ``system.log`` and at debug level into a file called ``debug.log``. When running in the
foreground, it will also log at INFO level to the console.
The default logger is `logback`. By default it will log:
- **INFO** level in ``system.log``
- **DEBUG** level in ``debug.log``
When running in the foreground, it will also log at INFO level to the console. You can change logging properties by editing ``logback.xml`` or by running the `nodetool setlogginglevel` command.
.. glossary::
Glossary
========
Cassandra
Apache Cassandra is a distributed, high-available, eventually consistent NoSQL open-source database.
cluster
Two or more database instances that exchange messages using the gossip protocol.
commitlog
A file to which the database appends changed data for recovery in the event of a hardware failure.