KEMBAR78
Getting to Know the Cassandra Codebase | PPTX
Cozy with CassandraGetting to know the Cassandra CodebaseGary Dusbabek • Rackspace@gdusbabekCassandra Summit • Mission Bay Conference Center • San Francisco • 10 August 2010
OutlineCode ThemesStartup SequenceKey ClassesRead PathWrite PathStages & ThreadingBootstrap & StreamingTests & IDE considerationsAdding API methodsQuestions
Themes and PatternsLayers
Themes and PatternsServices
Themes and PatternsSingletons and statics
Themes and PatternsStages &Thread pools
StartupProcess
CassandraDaemonLoads configurationTransport initializationStorage (Keyspace initialization)CommitLogrecoveryStorageService.initServer()Initializes CassandraServerPasses it off to transport
CassandraServerImplements IDL interface methods (cassandra.thrift, cassandra.genavro)Good place to start diving when adding or troubleshooting API methods
ConfigurationDatabaseDescriptorVia CassandraDaemon.setup()Looks for config path, loads yamlDoesn’t spin anything upDefines system tablesKS and CF described by CFMetaData and KSMetaData
CodeSomeClasses
Main ControllersEnd with *Service or *ManagerStorageService, MessagingServiceCompactionManager, HintedHandoffManager, StageManager, StreamInManager
StorageProxyPut & Get methodsCollection of static methodsMerges local and distributed operationsTracks latencyExposed via StorageProxyMBean
StorageServiceinitServer()—Starts servicesRegisters verb handlers (in MessagingService)Main event respondersRepository of replication strategies and TokenMetadataRing topology & token information
MessagingServiceVerb handlers reside hereSets up socket listenersGateway for outbound messagesMS.sendRR()MS.sendOneWay()Inbound tooMS.receive()
Table & ColumnFamilyStoreAlso RowMutationLow-level storage operationso.a.c.db.*SSTableLocal operations
Read+Write Paths
Reading	Socket->CassandraServerPermissionsRequest validationMarshalling
Reading	StorageProxyRangesCollectorsLocal & remote branches
Reading	StorageProxy localTable, ColumnFamilyStoreCFSMake QueryFilterQuery MemtablesQuery SSTablesCoalesce in iteratorso.a.c.db packageo.a.c.db.filter
Reading	StorageProxy remoteread commandResponse handlerSend to remote nodes
WritingSocket->CassandraServerValidationConvert to Mutation (IDL object)Penalties!
WritingStorageProxyblocking/non-blocking mutate local/remote branchRowMutationone ColumnFamily perColumnFamilyCollection of column modifications
WritingRM.apply->Table.applyWrite to CLIterate over RM CFsCFS.apply()Overwrites results on pre-existing column families
WritingRM is serialized into a Message and sent to other nodesWaits for ACKs depending on CL
Stages & Threading
StagesSEDAhttp://www.eecs.harvard.edu/~mdw/papers/seda-sosp01.pdfo.a.c.concurrent.StageManagerRead, mutation, stream, gossip, response, anti entropy, load balance, migrationThread pools that consume tasks
ThreadsMessagingService.listen() spawns thread.Each incoming connection spawns a new short-lived thread (IncomingTcpConnection)Non-stream ops go to MS.messageDeserializerExecutor_Stream ops handled there.Anti-entropy repair
Bootstraping!Any interest?//FIXMETODOCRAP
<=0.6 BootstrapingA wants data, B has data.StreamingRequestMessage A->BHandled on B by StreamRequestVerbHandlerFor each range StreamOut.transferRanges()Flush, anticompactionStreamInitiateMessage B->A for each range transferMeanwhile, back on A…StreamInitiateVerbHandler gets the SIM from B, does some nesting.StreamInitiateDone A->BBack to B…StreamInitiateDoneHandler gets the SID from ACalls StreamOutManager.startNext() which sends a single file to AMessagingService on A picks this up and the file is streamed.Sstable is createdSTREAM_FINISHED A->BB gets rid of the file, calls SOM.startNext()
0.7 BootstrappingA wants data, B has dataStreamRequestMessage A->BOn B, StreamRequestVerbHandlerIf single file, sends it.If range, StreamOut.transferRangesForRequest()Send next file (first will contain meta data  about all files)On A, IncomingStreamReader.read()	Data is received, sstable createdAck, request next file
TestsTestable & UntestableUnit testsant clean build testSystem testsant gen-thrift-pynosetests test/system/test_thrift_server.py
IDEConfiguration file must be in the classpathTreat as sourcelib vs build/libLog at debug
IDE-ea -Xms128M –Xmx2G -Dcom.sun.management.jmxremote.port=8081 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false-Dcassandra-foreground=yes -Dlog4j.configuration=log4j-server.properties-Dmx4jport=9081
Adding API methodsSame goes for modifyingDefine method and structures in IDLinterface/cassandra.thriftRegenerate filesant gen-thrift-java gen-thrift-pyImplement methods in o.a.c.thrift.CassandraServerCreate a system test (tests/system/test_thrift_server.py)
Questions?gdusbabek@gmail.com@gdusbabek

Getting to Know the Cassandra Codebase

Editor's Notes

  • #3 Talk about Cassandra processes and the classes that relate to them.
  • #4 Data operated on,then passed off to another layer.Evident on R/W pathOnion analogy
  • #5 DisjointCoupled when neccessary
  • #6 Model of class/object designSuck it upNot a class project
  • #15 ServicesGossiperMessagingServiceMigrationManagerBootstrapPreloaded cacheHandle to server-related tasks.
  • #28 layers