Documentation ¶
Overview ¶
Package storage provides access to the Store and Range abstractions. Each Cockroach node handles one or more stores, each of which multiplexes to one or more ranges, identified by [start, end) keys. Ranges are contiguous regions of the keyspace. Each range implements an instance of the Raft consensus algorithm to synchronize participating range replicas.
Each store is represented by a single engine.Engine instance. The ranges hosted by a store all have access to the same engine, but write to only a range-limited keyspace within it. Ranges access the underlying engine via the MVCC interface, which provides historical versioned values.
Package storage is a generated protocol buffer package. It is generated from these files: cockroach/storage/raft.proto cockroach/storage/rangetree.proto It has these top-level messages: RaftMessageRequest RaftMessageResponse ConfChangeContext RangeTree RangeTreeNode
Example (Rebalancing) ¶
stopper := stop.NewStopper() defer stopper.Stop() // Model a set of stores in a cluster, // randomly adding / removing stores and adding bytes. g := gossip.New(nil, nil, stopper) // Have to call g.SetNodeID before call g.AddInfo g.SetNodeID(roachpb.NodeID(1)) sp := NewStorePool( g, hlc.NewClock(hlc.UnixNano), nil, /* reservationsEnabled */ true, TestTimeUntilStoreDeadOff, stopper, ) alloc := MakeAllocator(sp, AllocatorOptions{AllowRebalance: true, Deterministic: true}) var wg sync.WaitGroup g.RegisterCallback(gossip.MakePrefixPattern(gossip.KeyStorePrefix), func(_ string, _ roachpb.Value) { wg.Done() }) const generations = 100 const nodes = 20 // Initialize testStores. var testStores [nodes]testStore for i := 0; i < len(testStores); i++ { testStores[i].StoreID = roachpb.StoreID(i) testStores[i].Node = roachpb.NodeDescriptor{NodeID: roachpb.NodeID(i)} testStores[i].Capacity = roachpb.StoreCapacity{Capacity: 1 << 30, Available: 1 << 30} } // Initialize the cluster with a single range. testStores[0].add(alloc.randGen.Int63n(1 << 20)) for i := 0; i < generations; i++ { // First loop through test stores and add data. wg.Add(len(testStores)) for j := 0; j < len(testStores); j++ { // Add a pretend range to the testStore if there's already one. if testStores[j].Capacity.RangeCount > 0 { testStores[j].add(alloc.randGen.Int63n(1 << 20)) } if err := g.AddInfoProto(gossip.MakeStoreKey(roachpb.StoreID(j)), &testStores[j].StoreDescriptor, 0); err != nil { panic(err) } } wg.Wait() // Next loop through test stores and maybe rebalance. for j := 0; j < len(testStores); j++ { ts := &testStores[j] if alloc.ShouldRebalance(ts.StoreID) { target := alloc.RebalanceTarget(ts.StoreID, roachpb.Attributes{}, []roachpb.ReplicaDescriptor{{NodeID: ts.Node.NodeID, StoreID: ts.StoreID}}) if target != nil { testStores[j].rebalance(&testStores[int(target.StoreID)], alloc.randGen.Int63n(1<<20)) } } } // Output store capacities as hexadecimal 2-character values. if i%(generations/50) == 0 { var maxBytes int64 for j := 0; j < len(testStores); j++ { bytes := testStores[j].Capacity.Capacity - testStores[j].Capacity.Available if bytes > maxBytes { maxBytes = bytes } } if maxBytes > 0 { for j := 0; j < len(testStores); j++ { endStr := " " if j == len(testStores)-1 { endStr = "" } bytes := testStores[j].Capacity.Capacity - testStores[j].Capacity.Available fmt.Printf("%03d%s", (999*bytes)/maxBytes, endStr) } fmt.Printf("\n") } } } var totBytes int64 var totRanges int32 for i := 0; i < len(testStores); i++ { totBytes += testStores[i].Capacity.Capacity - testStores[i].Capacity.Available totRanges += testStores[i].Capacity.RangeCount } fmt.Printf("Total bytes=%d, ranges=%d\n", totBytes, totRanges)
Output: 999 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 999 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 999 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 999 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 999 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 999 000 000 000 000 000 000 000 000 000 045 140 000 000 000 000 000 105 000 000 999 014 143 000 000 000 000 039 017 000 112 071 000 088 009 000 097 134 000 151 999 196 213 000 000 000 143 098 210 039 262 260 077 139 078 087 237 316 281 267 999 394 368 391 000 393 316 356 364 263 474 262 214 321 345 374 403 445 574 220 999 337 426 577 023 525 459 426 229 315 495 327 310 370 363 423 390 473 587 308 999 481 529 533 132 563 519 496 396 363 636 337 414 408 425 533 445 605 559 405 999 572 585 507 256 609 570 586 513 341 660 347 544 443 488 525 446 596 556 462 999 580 575 603 325 636 590 549 495 337 698 386 663 526 518 511 517 572 546 533 999 576 601 637 374 629 573 558 520 391 684 446 692 555 510 461 552 593 568 564 999 573 636 671 441 643 619 629 628 452 705 525 795 590 542 525 589 658 589 655 999 585 625 651 467 686 606 662 611 508 654 516 746 594 542 528 591 646 569 642 999 636 690 728 501 704 638 700 619 539 688 555 738 592 556 568 659 669 602 649 999 655 749 773 519 790 713 781 698 604 758 601 755 634 580 661 716 735 607 660 999 648 716 726 549 813 748 766 693 606 784 568 749 655 579 642 692 711 587 632 999 688 734 731 553 805 736 779 701 575 763 562 722 647 599 631 691 732 598 608 999 679 770 719 590 815 754 799 687 613 748 540 715 664 590 638 703 720 621 588 999 736 775 724 614 813 771 829 703 679 782 560 754 692 624 658 756 763 636 643 999 759 792 737 688 847 782 872 761 695 841 617 756 730 607 664 762 807 677 666 999 793 837 754 704 876 803 897 753 742 880 639 758 766 653 684 785 850 720 670 999 815 864 778 735 921 843 927 778 752 896 696 775 796 698 681 775 859 730 693 999 827 876 759 759 911 838 938 781 798 920 708 778 794 698 711 804 870 732 710 999 815 893 733 790 924 849 940 755 777 901 720 794 832 704 721 834 851 722 748 999 820 905 772 807 941 884 938 781 788 888 738 835 849 735 742 865 884 743 791 999 828 889 768 828 939 865 936 789 805 913 751 841 860 751 759 895 889 730 814 999 829 893 794 840 933 883 943 805 830 929 735 842 871 778 788 886 912 746 845 999 848 892 820 824 963 913 978 832 828 952 755 860 890 784 814 905 905 755 855 999 847 880 846 847 963 939 984 851 835 958 777 862 880 799 829 912 895 772 870 999 850 886 859 871 950 921 998 847 823 925 759 877 861 787 810 908 915 798 840 982 854 891 854 900 956 945 999 833 804 929 767 896 861 781 797 911 932 791 855 961 849 884 846 881 949 928 999 829 796 906 768 868 858 797 804 883 897 774 834 965 863 924 874 903 988 953 999 864 831 924 786 876 886 821 804 903 940 799 843 963 873 936 880 915 997 966 999 885 832 935 799 891 919 854 801 916 953 802 866 951 886 938 873 900 990 972 999 898 822 915 795 871 917 853 798 928 953 779 850 932 880 939 866 897 999 948 970 884 837 912 805 877 893 866 807 922 933 791 846 925 896 935 885 899 999 963 965 886 858 897 820 894 876 876 811 918 921 793 856 926 881 933 876 896 999 952 942 857 859 878 812 898 884 883 791 920 894 783 853 951 890 947 898 919 999 959 952 863 871 895 845 902 898 893 816 934 920 790 881 962 895 959 919 921 999 982 951 883 877 901 860 911 910 899 835 949 923 803 883 957 886 970 905 915 999 970 974 888 894 924 879 938 930 909 847 955 937 830 899 941 881 958 889 914 999 957 953 885 890 900 870 946 919 885 822 950 927 832 875 937 888 962 897 934 999 963 950 902 900 905 890 952 920 895 831 963 930 852 872 916 888 967 881 924 999 970 946 912 890 901 889 958 910 911 830 966 928 834 866 900 859 959 877 895 999 955 931 893 868 894 881 929 893 885 813 937 909 819 849 902 857 960 875 896 999 944 929 911 867 911 895 946 897 897 812 926 921 815 859 902 855 951 867 893 999 949 938 901 867 911 892 949 898 903 803 935 930 809 868 Total bytes=909881714, ranges=1745
Index ¶
- Constants
- Variables
- func ComputeStatsForRange(d *roachpb.RangeDescriptor, e engine.Reader, nowNanos int64) (enginepb.MVCCStats, error)
- func DecodeRaftCommand(data []byte) (commandID string, command []byte)
- func DeleteRange(txn *client.Txn, b *client.Batch, key roachpb.RKey) error
- func InsertRange(txn *client.Txn, b *client.Batch, key roachpb.RKey) error
- func IterateRangeDescriptors(eng engine.Reader, fn func(desc roachpb.RangeDescriptor) (bool, error)) error
- func RegisterMultiRaftServer(s *grpc.Server, srv MultiRaftServer)
- func SetupRangeTree(ctx context.Context, batch engine.Batch, ms *enginepb.MVCCStats, ...) error
- func TrackRaftProtos() func() []reflect.Type
- type AbortCache
- func (sc *AbortCache) ClearData(e engine.Engine) error
- func (sc *AbortCache) CopyFrom(ctx context.Context, e engine.ReadWriter, ms *enginepb.MVCCStats, ...) (int, error)
- func (sc *AbortCache) CopyInto(e engine.ReadWriter, ms *enginepb.MVCCStats, destRangeID roachpb.RangeID) (int, error)
- func (sc *AbortCache) Del(ctx context.Context, e engine.ReadWriter, ms *enginepb.MVCCStats, ...) error
- func (sc *AbortCache) Get(ctx context.Context, e engine.Reader, txnID *uuid.UUID, ...) (bool, error)
- func (sc *AbortCache) Iterate(ctx context.Context, e engine.Reader, ...)
- func (sc *AbortCache) Put(ctx context.Context, e engine.ReadWriter, ms *enginepb.MVCCStats, ...) error
- type Allocator
- func (a *Allocator) AllocateTarget(required roachpb.Attributes, existing []roachpb.ReplicaDescriptor, ...) (*roachpb.StoreDescriptor, error)
- func (a *Allocator) ComputeAction(zone config.ZoneConfig, desc *roachpb.RangeDescriptor) (AllocatorAction, float64)
- func (a Allocator) RebalanceTarget(storeID roachpb.StoreID, required roachpb.Attributes, ...) *roachpb.StoreDescriptor
- func (a Allocator) RemoveTarget(existing []roachpb.ReplicaDescriptor) (roachpb.ReplicaDescriptor, error)
- func (a *Allocator) ShouldRebalance(storeID roachpb.StoreID) bool
- type AllocatorAction
- type AllocatorOptions
- type CommandQueue
- type ConfChangeContext
- func (*ConfChangeContext) Descriptor() ([]byte, []int)
- func (m *ConfChangeContext) Marshal() (data []byte, err error)
- func (m *ConfChangeContext) MarshalTo(data []byte) (int, error)
- func (*ConfChangeContext) ProtoMessage()
- func (m *ConfChangeContext) Reset()
- func (m *ConfChangeContext) Size() (n int)
- func (m *ConfChangeContext) String() string
- func (m *ConfChangeContext) Unmarshal(data []byte) error
- type GCInfo
- type MultiRaftClient
- type MultiRaftServer
- type MultiRaft_RaftMessageClient
- type MultiRaft_RaftMessageServer
- type NodeAddressResolver
- type NotBootstrappedError
- type RaftMessageRequest
- func (*RaftMessageRequest) Descriptor() ([]byte, []int)
- func (*RaftMessageRequest) GetUser() string
- func (m *RaftMessageRequest) Marshal() (data []byte, err error)
- func (m *RaftMessageRequest) MarshalTo(data []byte) (int, error)
- func (*RaftMessageRequest) ProtoMessage()
- func (m *RaftMessageRequest) Reset()
- func (m *RaftMessageRequest) Size() (n int)
- func (m *RaftMessageRequest) String() string
- func (m *RaftMessageRequest) Unmarshal(data []byte) error
- type RaftMessageResponse
- func (*RaftMessageResponse) Descriptor() ([]byte, []int)
- func (m *RaftMessageResponse) Marshal() (data []byte, err error)
- func (m *RaftMessageResponse) MarshalTo(data []byte) (int, error)
- func (*RaftMessageResponse) ProtoMessage()
- func (m *RaftMessageResponse) Reset()
- func (m *RaftMessageResponse) Size() (n int)
- func (m *RaftMessageResponse) String() string
- func (m *RaftMessageResponse) Unmarshal(data []byte) error
- type RaftSnapshotStatus
- type RaftTransport
- type RangeEventLogType
- type RangeTree
- func (*RangeTree) Descriptor() ([]byte, []int)
- func (m *RangeTree) Marshal() (data []byte, err error)
- func (m *RangeTree) MarshalTo(data []byte) (int, error)
- func (*RangeTree) ProtoMessage()
- func (m *RangeTree) Reset()
- func (m *RangeTree) Size() (n int)
- func (m *RangeTree) String() string
- func (m *RangeTree) Unmarshal(data []byte) error
- type RangeTreeNode
- func (*RangeTreeNode) Descriptor() ([]byte, []int)
- func (m *RangeTreeNode) Marshal() (data []byte, err error)
- func (m *RangeTreeNode) MarshalTo(data []byte) (int, error)
- func (*RangeTreeNode) ProtoMessage()
- func (m *RangeTreeNode) Reset()
- func (m *RangeTreeNode) Size() (n int)
- func (m *RangeTreeNode) String() string
- func (m *RangeTreeNode) Unmarshal(data []byte) error
- type Replica
- func (r *Replica) AdminMerge(ctx context.Context, args roachpb.AdminMergeRequest, ...) (roachpb.AdminMergeResponse, *roachpb.Error)
- func (r *Replica) AdminSplit(ctx context.Context, args roachpb.AdminSplitRequest, ...) (roachpb.AdminSplitResponse, *roachpb.Error)
- func (r *Replica) BeginTransaction(ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, ...) (roachpb.BeginTransactionResponse, error)
- func (r *Replica) ChangeFrozen(ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, ...) (roachpb.ChangeFrozenResponse, error)
- func (r *Replica) ChangeReplicas(changeType roachpb.ReplicaChangeType, replica roachpb.ReplicaDescriptor, ...) error
- func (r *Replica) CheckConsistency(args roachpb.CheckConsistencyRequest, desc *roachpb.RangeDescriptor) (roachpb.CheckConsistencyResponse, *roachpb.Error)
- func (r *Replica) ComputeChecksum(ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, ...) (roachpb.ComputeChecksumResponse, error)
- func (r *Replica) ConditionalPut(ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, ...) (roachpb.ConditionalPutResponse, error)
- func (r *Replica) ContainsKey(key roachpb.Key) bool
- func (r *Replica) ContainsKeyRange(start, end roachpb.Key) bool
- func (r *Replica) Delete(ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, ...) (roachpb.DeleteResponse, error)
- func (r *Replica) DeleteRange(ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, ...) (roachpb.DeleteRangeResponse, error)
- func (r *Replica) Desc() *roachpb.RangeDescriptor
- func (r *Replica) Destroy(origDesc roachpb.RangeDescriptor) error
- func (r *Replica) EndTransaction(ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, ...) (roachpb.EndTransactionResponse, []roachpb.Intent, error)
- func (r *Replica) Entries(lo, hi, maxBytes uint64) ([]raftpb.Entry, error)
- func (r *Replica) FirstIndex() (uint64, error)
- func (r *Replica) GC(ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, ...) (roachpb.GCResponse, error)
- func (r *Replica) Get(ctx context.Context, batch engine.ReadWriter, h roachpb.Header, ...) (roachpb.GetResponse, []roachpb.Intent, error)
- func (r *Replica) GetFirstIndex() (uint64, error)
- func (r *Replica) GetMVCCStats() enginepb.MVCCStats
- func (r *Replica) GetMaxBytes() int64
- func (r *Replica) GetReplica() (*roachpb.ReplicaDescriptor, error)
- func (r *Replica) GetSnapshot() (raftpb.Snapshot, error)
- func (r *Replica) HeartbeatTxn(ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, ...) (roachpb.HeartbeatTxnResponse, error)
- func (r *Replica) Increment(ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, ...) (roachpb.IncrementResponse, error)
- func (r *Replica) InitPut(ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, ...) (roachpb.InitPutResponse, error)
- func (r *Replica) InitialState() (raftpb.HardState, raftpb.ConfState, error)
- func (r *Replica) IsFirstRange() bool
- func (r *Replica) IsInitialized() bool
- func (r *Replica) LastIndex() (uint64, error)
- func (r *Replica) LeaderLease(ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, ...) (roachpb.LeaderLeaseResponse, error)
- func (r *Replica) Less(i btree.Item) bool
- func (r *Replica) Merge(ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, ...) (roachpb.MergeResponse, error)
- func (r *Replica) PushTxn(ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, ...) (roachpb.PushTxnResponse, error)
- func (r *Replica) Put(ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, ...) (roachpb.PutResponse, error)
- func (r *Replica) RaftStatus() *raft.Status
- func (r *Replica) RangeLookup(ctx context.Context, batch engine.ReadWriter, h roachpb.Header, ...) (roachpb.RangeLookupResponse, []roachpb.Intent, error)
- func (r *Replica) ReplicaDescriptor(replicaID roachpb.ReplicaID) (roachpb.ReplicaDescriptor, error)
- func (r *Replica) ResolveIntent(ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, ...) (roachpb.ResolveIntentResponse, error)
- func (r *Replica) ResolveIntentRange(ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, ...) (roachpb.ResolveIntentRangeResponse, error)
- func (r *Replica) ReverseScan(ctx context.Context, batch engine.ReadWriter, h roachpb.Header, ...) (roachpb.ReverseScanResponse, []roachpb.Intent, error)
- func (r *Replica) Scan(ctx context.Context, batch engine.ReadWriter, h roachpb.Header, ...) (roachpb.ScanResponse, []roachpb.Intent, error)
- func (r *Replica) Send(ctx context.Context, ba roachpb.BatchRequest) (*roachpb.BatchResponse, *roachpb.Error)
- func (r *Replica) SetMaxBytes(maxBytes int64)
- func (r *Replica) Snapshot() (raftpb.Snapshot, error)
- func (r *Replica) State() storagebase.RangeInfo
- func (r *Replica) String() string
- func (r *Replica) Term(i uint64) (uint64, error)
- func (r *Replica) TruncateLog(ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, ...) (roachpb.TruncateLogResponse, error)
- func (r *Replica) VerifyChecksum(ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, ...) (roachpb.VerifyChecksumResponse, error)
- type ReplicaDataIterator
- type ReplicaSnapshotDiff
- type Store
- func (s *Store) AcquireRaftSnapshot() bool
- func (s *Store) AddReplicaTest(rng *Replica) error
- func (s *Store) Attrs() roachpb.Attributes
- func (s *Store) Bootstrap(ident roachpb.StoreIdent, stopper *stop.Stopper) error
- func (s *Store) BootstrapRange(initialValues []roachpb.KeyValue) error
- func (s *Store) Capacity() (roachpb.StoreCapacity, error)
- func (s *Store) Clock() *hlc.Clock
- func (s *Store) ClusterID() uuid.UUID
- func (s *Store) ComputeMetrics() error
- func (s *Store) DB() *client.DB
- func (s *Store) Descriptor() (*roachpb.StoreDescriptor, error)
- func (s *Store) DrainLeadership(drain bool) error
- func (s *Store) Engine() engine.Engine
- func (s *Store) FrozenStatus(collectFrozen bool) (descs []roachpb.ReplicaDescriptor)
- func (s *Store) GetReplica(rangeID roachpb.RangeID) (*Replica, error)
- func (s *Store) Gossip() *gossip.Gossip
- func (s *Store) GossipStore()
- func (s *Store) IsDrainingLeadership() bool
- func (s *Store) IsStarted() bool
- func (s *Store) LookupReplica(start, end roachpb.RKey) *Replica
- func (s *Store) MVCCStats() enginepb.MVCCStats
- func (s *Store) MergeRange(subsumingRng *Replica, updatedEndKey roachpb.RKey, ...) error
- func (s *Store) NewRangeDescriptor(start, end roachpb.RKey, replicas []roachpb.ReplicaDescriptor) (*roachpb.RangeDescriptor, error)
- func (s *Store) NewSnapshot() engine.Reader
- func (s *Store) RaftStatus(rangeID roachpb.RangeID) *raft.Status
- func (s *Store) Registry() *metric.Registry
- func (s *Store) ReleaseRaftSnapshot()
- func (s *Store) RemoveReplica(rep *Replica, origDesc roachpb.RangeDescriptor, destroy bool) error
- func (s *Store) ReplicaCount() int
- func (s *Store) ReplicaDescriptor(groupID roachpb.RangeID, replicaID roachpb.ReplicaID) (roachpb.ReplicaDescriptor, error)
- func (s *Store) Reserve(req roachpb.ReservationRequest) roachpb.ReservationResponse
- func (s *Store) Send(ctx context.Context, ba roachpb.BatchRequest) (br *roachpb.BatchResponse, pErr *roachpb.Error)
- func (s *Store) SetRangeRetryOptions(ro retry.Options)
- func (s *Store) SplitRange(origRng, newRng *Replica) error
- func (s *Store) Start(stopper *stop.Stopper) error
- func (s *Store) Stopper() *stop.Stopper
- func (s *Store) StoreID() roachpb.StoreID
- func (s *Store) String() string
- func (s *Store) TestingKnobs() *StoreTestingKnobs
- func (s *Store) Tracer() opentracing.Tracer
- func (s *Store) WaitForInit()
- type StoreContext
- type StoreList
- type StorePool
- type StoreTestingKnobs
- type Stores
- func (ls *Stores) AddStore(s *Store)
- func (ls *Stores) FirstRange() (*roachpb.RangeDescriptor, error)
- func (ls *Stores) GetStore(storeID roachpb.StoreID) (*Store, error)
- func (ls *Stores) GetStoreCount() int
- func (ls *Stores) HasStore(storeID roachpb.StoreID) bool
- func (ls *Stores) RangeLookup(key roachpb.RKey, _ *roachpb.RangeDescriptor, ...) ([]roachpb.RangeDescriptor, []roachpb.RangeDescriptor, *roachpb.Error)
- func (ls *Stores) ReadBootstrapInfo(bi *gossip.BootstrapInfo) error
- func (ls *Stores) RemoveStore(s *Store)
- func (ls *Stores) Send(ctx context.Context, ba roachpb.BatchRequest) (*roachpb.BatchResponse, *roachpb.Error)
- func (ls *Stores) VisitStores(visitor func(s *Store) error) error
- func (ls *Stores) WriteBootstrapInfo(bi *gossip.BootstrapInfo) error
Examples ¶
Constants ¶
const ( // RaftLogQueueTimerDuration is the duration between truncations. This needs // to be relatively short so that truncations can keep up with raft log entry // creation. RaftLogQueueTimerDuration = 50 * time.Millisecond // RaftLogQueueStaleThreshold is the minimum threshold for stale raft log // entries. A stale entry is one which all replicas of the range have // progressed past and thus is no longer needed and can be pruned. RaftLogQueueStaleThreshold = 100 )
const ( // TestTimeUntilStoreDead is the test value for TimeUntilStoreDead to // quickly mark stores as dead. TestTimeUntilStoreDead = 5 * time.Millisecond // TestTimeUntilStoreDeadOff is the test value for TimeUntilStoreDead that // prevents the store pool from marking stores as dead. TestTimeUntilStoreDeadOff = 24 * time.Hour )
const ( // MinTSCacheWindow specifies the minimum duration to hold entries in // the cache before allowing eviction. After this window expires, // transactions writing to this node with timestamps lagging by more // than minCacheWindow will necessarily have to advance their commit // timestamp. MinTSCacheWindow = 10 * time.Second )
const RangeEventTableSchema = `` /* 314-byte string literal not displayed */
RangeEventTableSchema defines the schema of the event log table. It is currently envisioned as a wide table; many different event types can be recorded to the table.
const ( // ReplicaGCQueueInactivityThreshold is the inactivity duration after which // a range will be considered for garbage collection. Exported for testing. ReplicaGCQueueInactivityThreshold = 10 * 24 * time.Hour // 10 days )
Variables ¶
var ( ErrInvalidLengthRaft = fmt.Errorf("proto: negative length found during unmarshaling") ErrIntOverflowRaft = fmt.Errorf("proto: integer overflow") )
var ( ErrInvalidLengthRangetree = fmt.Errorf("proto: negative length found during unmarshaling") ErrIntOverflowRangetree = fmt.Errorf("proto: integer overflow") )
Functions ¶
func ComputeStatsForRange ¶
func ComputeStatsForRange( d *roachpb.RangeDescriptor, e engine.Reader, nowNanos int64, ) (enginepb.MVCCStats, error)
ComputeStatsForRange computes the stats for a given range by iterating over all key ranges for the given range that should be accounted for in its stats.
func DecodeRaftCommand ¶
DecodeRaftCommand splits a raftpb.Entry.Data into its commandID and command portions. The caller is responsible for checking that the data is not empty (which indicates a dummy entry generated by raft rather than a real command). Usage is mostly internal to the storage package but is exported for use by debugging tools.
func DeleteRange ¶
DeleteRange removes a range from the RangeTree. This should only be called from operations that remove ranges, such as AdminMerge.
func InsertRange ¶
InsertRange adds a new range to the RangeTree. This should only be called from operations that create new ranges, such as AdminSplit.
func IterateRangeDescriptors ¶
func IterateRangeDescriptors( eng engine.Reader, fn func(desc roachpb.RangeDescriptor) (bool, error), ) error
IterateRangeDescriptors calls the provided function with each descriptor from the provided Engine. The return values of this method and fn have semantics similar to engine.MVCCIterate.
func RegisterMultiRaftServer ¶
func RegisterMultiRaftServer(s *grpc.Server, srv MultiRaftServer)
func SetupRangeTree ¶
func SetupRangeTree( ctx context.Context, batch engine.Batch, ms *enginepb.MVCCStats, timestamp hlc.Timestamp, startKey roachpb.RKey, ) error
SetupRangeTree creates a new RangeTree. This should only be called as part of store.BootstrapRange. TODO(tschottdorf): other RangeTree operations should also propagate a Context.
func TrackRaftProtos ¶
TrackRaftProtos instruments proto marshalling to track protos which are marshalled downstream of raft. It returns a function that removes the instrumentation and returns the list of downstream-of-raft protos.
Types ¶
type AbortCache ¶
type AbortCache struct {
// contains filtered or unexported fields
}
The AbortCache sets markers for aborted transactions to provide protection against an aborted but active transaction not reading values it wrote (due to its intents having been removed).
The AbortCache stores responses in the underlying engine, using keys derived from Range ID and txn ID.
A AbortCache is not thread safe. Access to it is serialized through Raft.
func NewAbortCache ¶
func NewAbortCache(rangeID roachpb.RangeID) *AbortCache
NewAbortCache returns a new abort cache. Every range replica maintains an abort cache, not just the leader.
func (*AbortCache) ClearData ¶
func (sc *AbortCache) ClearData(e engine.Engine) error
ClearData removes all persisted items stored in the cache.
func (*AbortCache) CopyFrom ¶
func (sc *AbortCache) CopyFrom( ctx context.Context, e engine.ReadWriter, ms *enginepb.MVCCStats, originRangeID roachpb.RangeID, ) (int, error)
CopyFrom copies all the persisted results from the originRangeID abort cache into this one. Note that the cache will not be locked while copying is in progress. Failures decoding individual entries return an error. The copy is done directly using the engine instead of interpreting values through MVCC for efficiency. On success, returns the number of entries (key-value pairs) copied.
func (*AbortCache) CopyInto ¶
func (sc *AbortCache) CopyInto( e engine.ReadWriter, ms *enginepb.MVCCStats, destRangeID roachpb.RangeID, ) (int, error)
CopyInto copies all the results from this abort cache into the destRangeID abort cache. Failures decoding individual cache entries return an error. On success, returns the number of entries (key-value pairs) copied.
func (*AbortCache) Del ¶
func (sc *AbortCache) Del( ctx context.Context, e engine.ReadWriter, ms *enginepb.MVCCStats, txnID *uuid.UUID, ) error
Del removes all abort cache entries for the given transaction.
func (*AbortCache) Get ¶
func (sc *AbortCache) Get( ctx context.Context, e engine.Reader, txnID *uuid.UUID, entry *roachpb.AbortCacheEntry, ) (bool, error)
Get looks up an abort cache entry recorded for this transaction ID. Returns whether an abort record was found and any error.
func (*AbortCache) Iterate ¶
func (sc *AbortCache) Iterate( ctx context.Context, e engine.Reader, f func([]byte, *uuid.UUID, roachpb.AbortCacheEntry), )
Iterate walks through the abort cache, invoking the given callback for each unmarshaled entry with the key, the transaction ID and the decoded entry. TODO(tschottdorf): should not use a pointer to UUID.
func (*AbortCache) Put ¶
func (sc *AbortCache) Put( ctx context.Context, e engine.ReadWriter, ms *enginepb.MVCCStats, txnID *uuid.UUID, entry *roachpb.AbortCacheEntry, ) error
Put writes an entry for the specified transaction ID.
type Allocator ¶
type Allocator struct {
// contains filtered or unexported fields
}
Allocator tries to spread replicas as evenly as possible across the stores in the cluster.
func MakeAllocator ¶
func MakeAllocator(storePool *StorePool, options AllocatorOptions) Allocator
MakeAllocator creates a new allocator using the specified StorePool.
func (*Allocator) AllocateTarget ¶
func (a *Allocator) AllocateTarget(required roachpb.Attributes, existing []roachpb.ReplicaDescriptor, relaxConstraints bool, filter func(storeDesc *roachpb.StoreDescriptor, count, used *stat) bool) (*roachpb.StoreDescriptor, error)
AllocateTarget returns a suitable store for a new allocation with the required attributes. Nodes already accommodating existing replicas are ruled out as targets. If relaxConstraints is true, then the required attributes will be relaxed as necessary, from least specific to most specific, in order to allocate a target. If needed, a filter function can be added that further filter the results. The function will be passed the storeDesc and the used and new counts. It returns a bool indicating inclusion or exclusion from the set of stores being considered.
func (*Allocator) ComputeAction ¶
func (a *Allocator) ComputeAction(zone config.ZoneConfig, desc *roachpb.RangeDescriptor) ( AllocatorAction, float64)
ComputeAction determines the exact operation needed to repair the supplied range, as governed by the supplied zone configuration. It returns the required action that should be taken and a replica on which the action should be performed.
func (Allocator) RebalanceTarget ¶
func (a Allocator) RebalanceTarget( storeID roachpb.StoreID, required roachpb.Attributes, existing []roachpb.ReplicaDescriptor, ) *roachpb.StoreDescriptor
RebalanceTarget returns a suitable store for a rebalance target with required attributes. Rebalance targets are selected via the same mechanism as AllocateTarget(), except the chosen target must follow some additional criteria. Namely, if chosen, it must further the goal of balancing the cluster.
The supplied parameters are the StoreID of the replica being rebalanced, the required attributes for the replica being rebalanced, and a list of the existing replicas of the range (which must include the replica being rebalanced).
Simply ignoring a rebalance opportunity in the event that the target chosen by AllocateTarget() doesn't fit balancing criteria is perfectly fine, as other stores in the cluster will also be doing their probabilistic best to rebalance. This helps prevent a stampeding herd targeting an abnormally under-utilized store.
func (Allocator) RemoveTarget ¶
func (a Allocator) RemoveTarget(existing []roachpb.ReplicaDescriptor) (roachpb.ReplicaDescriptor, error)
RemoveTarget returns a suitable replica to remove from the provided replica set. It attempts to consider which of the provided replicas would be the best candidate for removal.
TODO(mrtracy): removeTarget eventually needs to accept the attributes from the zone config associated with the provided replicas. This will allow it to make correct decisions in the case of ranges with heterogeneous replica requirements (i.e. multiple data centers).
type AllocatorAction ¶
type AllocatorAction int
AllocatorAction enumerates the various replication adjustments that may be recommended by the allocator.
const ( AllocatorNoop AllocatorAction AllocatorRemove AllocatorAdd AllocatorRemoveDead )
These are the possible allocator actions.
type AllocatorOptions ¶
type AllocatorOptions struct { // AllowRebalance allows this store to attempt to rebalance its own // replicas to other stores. AllowRebalance bool // Deterministic makes allocation decisions deterministic, based on // current cluster statistics. If this flag is not set, allocation operations // will have random behavior. This flag is intended to be set for testing // purposes only. Deterministic bool }
AllocatorOptions are configurable options which effect the way that the replicate queue will handle rebalancing opportunities.
type CommandQueue ¶
type CommandQueue struct {
// contains filtered or unexported fields
}
A CommandQueue maintains an interval tree of keys or key ranges for executing commands. New commands affecting keys or key ranges must wait on already-executing commands which overlap their key range.
Before executing, a command invokes GetWait() to initialize a WaitGroup with the number of overlapping commands which are already running. The wait group is waited on by the caller for confirmation that all overlapping, pending commands have completed and the pending command can proceed.
After waiting, a command is added to the queue's already-executing set via add(). add accepts a parameter indicating whether the command is read-only. Read-only commands don't need to wait on other read-only commands, so the wait group returned via GetWait() doesn't include read-only on read-only overlapping commands as an optimization.
Once commands complete, remove() is invoked to remove the executing command and decrement the counts on any pending WaitGroups, possibly signaling waiting commands who were gated by the executing command's affected key(s).
CommandQueue is not thread safe.
func NewCommandQueue ¶
func NewCommandQueue() *CommandQueue
NewCommandQueue returns a new command queue.
type ConfChangeContext ¶
type ConfChangeContext struct { CommandID string `protobuf:"bytes,1,opt,name=command_id,json=commandId" json:"command_id"` // Payload is the application-level command (i.e. an encoded // roachpb.EndTransactionRequest). Payload []byte `protobuf:"bytes,2,opt,name=payload" json:"payload,omitempty"` // Replica contains full details about the replica being added or removed. Replica cockroach_roachpb.ReplicaDescriptor `protobuf:"bytes,3,opt,name=replica" json:"replica"` }
ConfChangeContext is encoded in the raftpb.ConfChange.Context field.
func (*ConfChangeContext) Descriptor ¶
func (*ConfChangeContext) Descriptor() ([]byte, []int)
func (*ConfChangeContext) Marshal ¶
func (m *ConfChangeContext) Marshal() (data []byte, err error)
func (*ConfChangeContext) MarshalTo ¶
func (m *ConfChangeContext) MarshalTo(data []byte) (int, error)
func (*ConfChangeContext) ProtoMessage ¶
func (*ConfChangeContext) ProtoMessage()
func (*ConfChangeContext) Reset ¶
func (m *ConfChangeContext) Reset()
func (*ConfChangeContext) Size ¶
func (m *ConfChangeContext) Size() (n int)
func (*ConfChangeContext) String ¶
func (m *ConfChangeContext) String() string
func (*ConfChangeContext) Unmarshal ¶
func (m *ConfChangeContext) Unmarshal(data []byte) error
type GCInfo ¶
type GCInfo struct { // Now is the timestamp used for age computations. Now hlc.Timestamp // Policy is the policy used for this garbage collection cycle. Policy config.GCPolicy // Stats about the userspace key-values considered, namely the number of // keys with GC'able data, the number of "old" intents and the number of // associated distinct transactions. GCKeys, IntentsConsidered, IntentTxns int // TransactionSpanTotal is the total number of entries in the transaction span. TransactionSpanTotal int // Summary of transactions which were found GCable (assuming that // potentially necessary intent resolutions did not fail). TransactionSpanGCAborted, TransactionSpanGCCommitted, TransactionSpanGCPending int // AbortSpanTotal is the total number of transactions present in the abort cache. AbortSpanTotal int // AbortSpanConsidered is the number of abort cache entries old enough to be // considered for removal. An "entry" corresponds to one transaction; // more than one key-value pair may be associated with it. AbortSpanConsidered int // AbortSpanGCNum is the number of abort cache entries fit for removal (due // to their transactions having terminated). AbortSpanGCNum int // PushTxn is the total number of pushes attempted in this cycle. PushTxn int // ResolveTotal is the total number of attempted intent resolutions in // this cycle. ResolveTotal int // ResolveErrors is the number of successful intent resolutions. ResolveSuccess int // Threshold is the computed expiration timestamp. Equal to `Now - Policy`. Threshold hlc.Timestamp }
GCInfo contains statistics and insights from a GC run.
func RunGC ¶
func RunGC( ctx context.Context, desc *roachpb.RangeDescriptor, snap engine.Reader, now hlc.Timestamp, policy config.GCPolicy, pushTxn pushFunc, resolveIntents resolveFunc, ) ([]roachpb.GCRequest_GCKey, GCInfo, error)
RunGC runs garbage collection for the specified descriptor on the provided Engine (which is not mutated). It uses the provided functions pushTxn and resolveIntents to clarify the true status of and clean up after encountered transactions. It returns a slice of gc'able keys from the data, transaction, and abort spans.
type MultiRaftClient ¶
type MultiRaftClient interface {
RaftMessage(ctx context.Context, opts ...grpc.CallOption) (MultiRaft_RaftMessageClient, error)
}
func NewMultiRaftClient ¶
func NewMultiRaftClient(cc *grpc.ClientConn) MultiRaftClient
type MultiRaftServer ¶
type MultiRaftServer interface {
RaftMessage(MultiRaft_RaftMessageServer) error
}
type MultiRaft_RaftMessageClient ¶
type MultiRaft_RaftMessageClient interface { Send(*RaftMessageRequest) error CloseAndRecv() (*RaftMessageResponse, error) grpc.ClientStream }
type MultiRaft_RaftMessageServer ¶
type MultiRaft_RaftMessageServer interface { SendAndClose(*RaftMessageResponse) error Recv() (*RaftMessageRequest, error) grpc.ServerStream }
type NodeAddressResolver ¶
NodeAddressResolver is the function used by RaftTransport to map node IDs to network addresses.
func GossipAddressResolver ¶
func GossipAddressResolver(gossip *gossip.Gossip) NodeAddressResolver
GossipAddressResolver is a thin wrapper around gossip's GetNodeIDAddress that allows its return value to be used as the net.Addr interface.
type NotBootstrappedError ¶
type NotBootstrappedError struct{}
A NotBootstrappedError indicates that an engine has not yet been bootstrapped due to a store identifier not being present.
func (*NotBootstrappedError) Error ¶
func (e *NotBootstrappedError) Error() string
Error formats error.
type RaftMessageRequest ¶
type RaftMessageRequest struct { GroupID github_com_cockroachdb_cockroach_roachpb.RangeID `protobuf:"varint,1,opt,name=group_id,json=groupId,casttype=github.com/cockroachdb/cockroach/roachpb.RangeID" json:"group_id"` FromReplica cockroach_roachpb.ReplicaDescriptor `protobuf:"bytes,2,opt,name=from_replica,json=fromReplica" json:"from_replica"` ToReplica cockroach_roachpb.ReplicaDescriptor `protobuf:"bytes,3,opt,name=to_replica,json=toReplica" json:"to_replica"` Message raftpb.Message `protobuf:"bytes,4,opt,name=message" json:"message"` }
RaftMessageRequest is the request used to send raft messages using our protobuf-based RPC codec.
func (*RaftMessageRequest) Descriptor ¶
func (*RaftMessageRequest) Descriptor() ([]byte, []int)
func (*RaftMessageRequest) GetUser ¶
func (*RaftMessageRequest) GetUser() string
GetUser implements security.RequestWithUser. Raft messages are always sent by the node user.
func (*RaftMessageRequest) Marshal ¶
func (m *RaftMessageRequest) Marshal() (data []byte, err error)
func (*RaftMessageRequest) MarshalTo ¶
func (m *RaftMessageRequest) MarshalTo(data []byte) (int, error)
func (*RaftMessageRequest) ProtoMessage ¶
func (*RaftMessageRequest) ProtoMessage()
func (*RaftMessageRequest) Reset ¶
func (m *RaftMessageRequest) Reset()
func (*RaftMessageRequest) Size ¶
func (m *RaftMessageRequest) Size() (n int)
func (*RaftMessageRequest) String ¶
func (m *RaftMessageRequest) String() string
func (*RaftMessageRequest) Unmarshal ¶
func (m *RaftMessageRequest) Unmarshal(data []byte) error
type RaftMessageResponse ¶
type RaftMessageResponse struct { }
RaftMessageResponse is an empty message returned by raft RPCs. If a response is needed it will be sent as a separate message.
func (*RaftMessageResponse) Descriptor ¶
func (*RaftMessageResponse) Descriptor() ([]byte, []int)
func (*RaftMessageResponse) Marshal ¶
func (m *RaftMessageResponse) Marshal() (data []byte, err error)
func (*RaftMessageResponse) MarshalTo ¶
func (m *RaftMessageResponse) MarshalTo(data []byte) (int, error)
func (*RaftMessageResponse) ProtoMessage ¶
func (*RaftMessageResponse) ProtoMessage()
func (*RaftMessageResponse) Reset ¶
func (m *RaftMessageResponse) Reset()
func (*RaftMessageResponse) Size ¶
func (m *RaftMessageResponse) Size() (n int)
func (*RaftMessageResponse) String ¶
func (m *RaftMessageResponse) String() string
func (*RaftMessageResponse) Unmarshal ¶
func (m *RaftMessageResponse) Unmarshal(data []byte) error
type RaftSnapshotStatus ¶
type RaftSnapshotStatus struct { Req *RaftMessageRequest Err error }
RaftSnapshotStatus contains a MsgSnap message and its resulting error, for asynchronous notification of completion.
type RaftTransport ¶
type RaftTransport struct { SnapshotStatusChan chan RaftSnapshotStatus // contains filtered or unexported fields }
RaftTransport handles the rpc messages for raft.
func NewDummyRaftTransport ¶
func NewDummyRaftTransport() *RaftTransport
NewDummyRaftTransport returns a dummy raft transport for use in tests which need a non-nil raft transport that need not function.
func NewRaftTransport ¶
func NewRaftTransport(resolver NodeAddressResolver, grpcServer *grpc.Server, rpcContext *rpc.Context) *RaftTransport
NewRaftTransport creates a new RaftTransport with specified resolver and grpc server. Callers are responsible for monitoring RaftTransport.SnapshotStatusChan.
func (*RaftTransport) Listen ¶
func (t *RaftTransport) Listen(storeID roachpb.StoreID, handler raftMessageHandler)
Listen registers a raftMessageHandler to receive proxied messages.
func (*RaftTransport) RaftMessage ¶
func (t *RaftTransport) RaftMessage(stream MultiRaft_RaftMessageServer) (err error)
RaftMessage proxies the incoming request to the listening server interface.
func (*RaftTransport) Send ¶
func (t *RaftTransport) Send(req *RaftMessageRequest) error
Send a message to the recipient specified in the request.
func (*RaftTransport) Stop ¶
func (t *RaftTransport) Stop(storeID roachpb.StoreID)
Stop unregisters a raftMessageHandler.
type RangeEventLogType ¶
type RangeEventLogType string
RangeEventLogType describes a specific event type recorded in the range log table.
const ( // RangeEventLogSplit is the event type recorded when a range splits. RangeEventLogSplit RangeEventLogType = "split" // RangeEventLogAdd is the event type recorded when a range adds a // new replica. RangeEventLogAdd RangeEventLogType = "add" // RangeEventLogRemove is the event type recorded when a range removes a // replica. RangeEventLogRemove RangeEventLogType = "remove" )
type RangeTree ¶
type RangeTree struct {
RootKey github_com_cockroachdb_cockroach_roachpb.RKey `` /* 130-byte string literal not displayed */
}
RangeTree holds the root node of the range tree.
func (*RangeTree) Descriptor ¶
func (*RangeTree) ProtoMessage ¶
func (*RangeTree) ProtoMessage()
type RangeTreeNode ¶
type RangeTreeNode struct { Key github_com_cockroachdb_cockroach_roachpb.RKey `protobuf:"bytes,1,opt,name=key,casttype=github.com/cockroachdb/cockroach/roachpb.RKey" json:"key,omitempty"` // Color is black if true, red if false. Black bool `protobuf:"varint,2,opt,name=black" json:"black"` // If the parent key is null, this is the root node. ParentKey github_com_cockroachdb_cockroach_roachpb.RKey `` /* 136-byte string literal not displayed */ LeftKey github_com_cockroachdb_cockroach_roachpb.RKey `` /* 130-byte string literal not displayed */ RightKey github_com_cockroachdb_cockroach_roachpb.RKey `` /* 133-byte string literal not displayed */ }
RangeTreeNode holds the configuration for each node of the Red-Black Tree that references all ranges.
func (*RangeTreeNode) Descriptor ¶
func (*RangeTreeNode) Descriptor() ([]byte, []int)
func (*RangeTreeNode) Marshal ¶
func (m *RangeTreeNode) Marshal() (data []byte, err error)
func (*RangeTreeNode) ProtoMessage ¶
func (*RangeTreeNode) ProtoMessage()
func (*RangeTreeNode) Reset ¶
func (m *RangeTreeNode) Reset()
func (*RangeTreeNode) Size ¶
func (m *RangeTreeNode) Size() (n int)
func (*RangeTreeNode) String ¶
func (m *RangeTreeNode) String() string
func (*RangeTreeNode) Unmarshal ¶
func (m *RangeTreeNode) Unmarshal(data []byte) error
type Replica ¶
type Replica struct { // TODO(tschottdorf): Duplicates r.mu.state.desc.RangeID; revisit that. RangeID roachpb.RangeID // Should only be set by the constructor. // contains filtered or unexported fields }
A Replica is a contiguous keyspace with writes managed via an instance of the Raft consensus algorithm. Many ranges may exist in a store and they are unlikely to be contiguous. Ranges are independent units and are responsible for maintaining their own integrity by replacing failed replicas, splitting and merging as appropriate.
func NewReplica ¶
func NewReplica(desc *roachpb.RangeDescriptor, store *Store, replicaID roachpb.ReplicaID) (*Replica, error)
NewReplica initializes the replica using the given metadata. If the replica is initialized (i.e. desc contains more than a RangeID), replicaID should be 0 and the replicaID will be discovered from the descriptor.
func (*Replica) AdminMerge ¶
func (r *Replica) AdminMerge( ctx context.Context, args roachpb.AdminMergeRequest, origLeftDesc *roachpb.RangeDescriptor, ) (roachpb.AdminMergeResponse, *roachpb.Error)
AdminMerge extends this range to subsume the range that comes next in the key space. The merge is performed inside of a distributed transaction which writes the updated range descriptor for the subsuming range and deletes the range descriptor for the subsumed one. It also updates the range addressing metadata. The handover of responsibility for the reassigned key range is carried out seamlessly through a merge trigger carried out as part of the commit of that transaction. A merge requires that the two ranges are collocated on the same set of replicas.
The supplied RangeDescriptor is used as a form of optimistic lock. See the comment of "AdminSplit" for more information on this pattern.
func (*Replica) AdminSplit ¶
func (r *Replica) AdminSplit( ctx context.Context, args roachpb.AdminSplitRequest, desc *roachpb.RangeDescriptor, ) (roachpb.AdminSplitResponse, *roachpb.Error)
AdminSplit divides the range into into two ranges, using either args.SplitKey (if provided) or an internally computed key that aims to roughly equipartition the range by size. The split is done inside of a distributed txn which writes updated and new range descriptors, and updates the range addressing metadata. The handover of responsibility for the reassigned key range is carried out seamlessly through a split trigger carried out as part of the commit of that transaction.
The supplied RangeDescriptor is used as a form of optimistic lock. An operation which might split a range should obtain a copy of the range's current descriptor before making the decision to split. If the decision is affirmative the descriptor is passed to AdminSplit, which performs a Conditional Put on the RangeDescriptor to ensure that no other operation has modified the range in the time the decision was being made. TODO(tschottdorf): should assert that split key is not a local key.
func (*Replica) BeginTransaction ¶
func (r *Replica) BeginTransaction( ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, h roachpb.Header, args roachpb.BeginTransactionRequest, ) (roachpb.BeginTransactionResponse, error)
BeginTransaction writes the initial transaction record. Fails in the event that a transaction record is already written. This may occur if a transaction is started with a batch containing writes to different ranges, and the range containing the txn record fails to receive the write batch before a heartbeat or txn push is performed first and aborts the transaction.
func (*Replica) ChangeFrozen ¶
func (r *Replica) ChangeFrozen( ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, h roachpb.Header, args roachpb.ChangeFrozenRequest, ) (roachpb.ChangeFrozenResponse, error)
ChangeFrozen freezes or unfreezes the Replica idempotently.
func (*Replica) ChangeReplicas ¶
func (r *Replica) ChangeReplicas( changeType roachpb.ReplicaChangeType, replica roachpb.ReplicaDescriptor, desc *roachpb.RangeDescriptor, ) error
ChangeReplicas adds or removes a replica of a range. The change is performed in a distributed transaction and takes effect when that transaction is committed. When removing a replica, only the NodeID and StoreID fields of the Replica are used.
The supplied RangeDescriptor is used as a form of optimistic lock. See the comment of "AdminSplit" for more information on this pattern.
func (*Replica) CheckConsistency ¶
func (r *Replica) CheckConsistency( args roachpb.CheckConsistencyRequest, desc *roachpb.RangeDescriptor, ) (roachpb.CheckConsistencyResponse, *roachpb.Error)
CheckConsistency runs a consistency check on the range. It first applies a ComputeChecksum command on the range. It then applies a VerifyChecksum command passing along a locally computed checksum for the range.
func (*Replica) ComputeChecksum ¶
func (r *Replica) ComputeChecksum( ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, h roachpb.Header, args roachpb.ComputeChecksumRequest, ) (roachpb.ComputeChecksumResponse, error)
ComputeChecksum starts the process of computing a checksum on the replica at a particular snapshot. The checksum is later verified through the VerifyChecksum request.
func (*Replica) ConditionalPut ¶
func (r *Replica) ConditionalPut( ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, h roachpb.Header, args roachpb.ConditionalPutRequest, ) (roachpb.ConditionalPutResponse, error)
ConditionalPut sets the value for a specified key only if the expected value matches. If not, the return value contains the actual value.
func (*Replica) ContainsKey ¶
ContainsKey returns whether this range contains the specified key.
func (*Replica) ContainsKeyRange ¶
ContainsKeyRange returns whether this range contains the specified key range from start to end.
func (*Replica) Delete ¶
func (r *Replica) Delete( ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, h roachpb.Header, args roachpb.DeleteRequest, ) (roachpb.DeleteResponse, error)
Delete deletes the key and value specified by key.
func (*Replica) DeleteRange ¶
func (r *Replica) DeleteRange( ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, h roachpb.Header, args roachpb.DeleteRangeRequest, ) (roachpb.DeleteRangeResponse, error)
DeleteRange deletes the range of key/value pairs specified by start and end keys.
func (*Replica) Desc ¶
func (r *Replica) Desc() *roachpb.RangeDescriptor
Desc returns the range's descriptor.
func (*Replica) Destroy ¶
func (r *Replica) Destroy(origDesc roachpb.RangeDescriptor) error
Destroy clears pending command queue by sending each pending command an error and cleans up all data associated with this range.
func (*Replica) EndTransaction ¶
func (r *Replica) EndTransaction( ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, h roachpb.Header, args roachpb.EndTransactionRequest, ) (roachpb.EndTransactionResponse, []roachpb.Intent, error)
EndTransaction either commits or aborts (rolls back) an extant transaction according to the args.Commit parameter. Rolling back an already rolled-back txn is ok.
func (*Replica) Entries ¶
Entries implements the raft.Storage interface. Note that maxBytes is advisory and this method will always return at least one entry even if it exceeds maxBytes. Passing maxBytes equal to zero disables size checking. TODO(bdarnell): consider caching for recent entries, if rocksdb's built in caching is insufficient. Entries requires that the replica lock is held.
func (*Replica) FirstIndex ¶
FirstIndex implements the raft.Storage interface. FirstIndex requires that the replica lock is held.
func (*Replica) GC ¶
func (r *Replica) GC( ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, h roachpb.Header, args roachpb.GCRequest, ) (roachpb.GCResponse, error)
GC iterates through the list of keys to garbage collect specified in the arguments. MVCCGarbageCollect is invoked on each listed key along with the expiration timestamp. The GC metadata specified in the args is persisted after GC.
func (*Replica) Get ¶
func (r *Replica) Get( ctx context.Context, batch engine.ReadWriter, h roachpb.Header, args roachpb.GetRequest, ) (roachpb.GetResponse, []roachpb.Intent, error)
Get returns the value for a specified key.
func (*Replica) GetFirstIndex ¶
GetFirstIndex is the same function as FirstIndex but it does not require that the replica lock is held.
func (*Replica) GetMVCCStats ¶
GetMVCCStats returns a copy of the MVCC stats object for this range.
func (*Replica) GetMaxBytes ¶
GetMaxBytes atomically gets the range maximum byte limit.
func (*Replica) GetReplica ¶
func (r *Replica) GetReplica() (*roachpb.ReplicaDescriptor, error)
GetReplica returns the replica for this range from the range descriptor. Returns nil, *errReplicaNotInRange if the replica is not found. No other errors are returned.
func (*Replica) GetSnapshot ¶
GetSnapshot wraps Snapshot() but does not require the replica lock to be held and it will block instead of returning ErrSnapshotTemporaryUnavailable.
func (*Replica) HeartbeatTxn ¶
func (r *Replica) HeartbeatTxn( ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, h roachpb.Header, args roachpb.HeartbeatTxnRequest, ) (roachpb.HeartbeatTxnResponse, error)
HeartbeatTxn updates the transaction status and heartbeat timestamp after receiving transaction heartbeat messages from coordinator. Returns the updated transaction.
func (*Replica) Increment ¶
func (r *Replica) Increment( ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, h roachpb.Header, args roachpb.IncrementRequest, ) (roachpb.IncrementResponse, error)
Increment increments the value (interpreted as varint64 encoded) and returns the newly incremented value (encoded as varint64). If no value exists for the key, zero is incremented.
func (*Replica) InitPut ¶
func (r *Replica) InitPut( ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, h roachpb.Header, args roachpb.InitPutRequest, ) (roachpb.InitPutResponse, error)
InitPut sets the value for a specified key only if it doesn't exist. It returns an error if the key exists with an existing value that is different from the value provided.
func (*Replica) InitialState ¶
InitialState implements the raft.Storage interface. InitialState requires that the replica lock be held.
func (*Replica) IsFirstRange ¶
IsFirstRange returns true if this is the first range.
func (*Replica) IsInitialized ¶
IsInitialized is true if we know the metadata of this range, either because we created it or we have received an initial snapshot from another node. It is false when a range has been created in response to an incoming message but we are waiting for our initial snapshot.
func (*Replica) LastIndex ¶
LastIndex implements the raft.Storage interface. LastIndex requires that the replica lock is held.
func (*Replica) LeaderLease ¶
func (r *Replica) LeaderLease( ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, h roachpb.Header, args roachpb.LeaderLeaseRequest, ) (roachpb.LeaderLeaseResponse, error)
LeaderLease sets the leader lease for this range. The command fails only if the desired start timestamp collides with a previous lease. Otherwise, the start timestamp is wound back to right after the expiration of the previous lease (or zero). If this range replica is already the lease holder, the expiration will be extended or shortened as indicated. For a new lease, all duties required of the range leader are commenced, including clearing the command queue and timestamp cache.
func (*Replica) Merge ¶
func (r *Replica) Merge( ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, h roachpb.Header, args roachpb.MergeRequest, ) (roachpb.MergeResponse, error)
Merge is used to merge a value into an existing key. Merge is an efficient accumulation operation which is exposed by RocksDB, used by CockroachDB for the efficient accumulation of certain values. Due to the difficulty of making these operations transactional, merges are not currently exposed directly to clients. Merged values are explicitly not MVCC data.
func (*Replica) PushTxn ¶
func (r *Replica) PushTxn( ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, h roachpb.Header, args roachpb.PushTxnRequest, ) (roachpb.PushTxnResponse, error)
PushTxn resolves conflicts between concurrent txns (or between a non-transactional reader or writer and a txn) in several ways depending on the statuses and priorities of the conflicting transactions. The PushTxn operation is invoked by a "pusher" (the writer trying to abort a conflicting txn or the reader trying to push a conflicting txn's commit timestamp forward), who attempts to resolve a conflict with a "pushee" (args.PushTxn -- the pushee txn whose intent(s) caused the conflict). A pusher is either transactional, in which case PushTxn is completely initialized, or not, in which case the PushTxn has only the priority set.
Txn already committed/aborted: If pushee txn is committed or aborted return success.
Txn Timeout: If pushee txn entry isn't present or its LastHeartbeat timestamp isn't set, use its as LastHeartbeat. If current time - LastHeartbeat > 2 * DefaultHeartbeatInterval, then the pushee txn should be either pushed forward, aborted, or confirmed not pending, depending on value of Request.PushType.
Old Txn Epoch: If persisted pushee txn entry has a newer Epoch than PushTxn.Epoch, return success, as older epoch may be removed.
Lower Txn Priority: If pushee txn has a lower priority than pusher, adjust pushee's persisted txn depending on value of args.PushType. If args.PushType is PUSH_ABORT, set txn.Status to ABORTED, and priority to one less than the pusher's priority and return success. If args.PushType is PUSH_TIMESTAMP, set txn.Timestamp to just after PushTo.
Higher Txn Priority: If pushee txn has a higher priority than pusher, return TransactionPushError. Transaction will be retried with priority one less than the pushee's higher priority.
If the pusher is non-transactional, args.PusherTxn is an empty proto with only the priority set.
If the pushee is aborted, its timestamp will be forwarded to match its last client activity timestamp (i.e. last heartbeat), if available. This is done so that the updated timestamp populates the abort cache, allowing the GC queue to purge entries for which the transaction coordinator must have found out via its heartbeats that the transaction has failed.
func (*Replica) Put ¶
func (r *Replica) Put( ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, h roachpb.Header, args roachpb.PutRequest, ) (roachpb.PutResponse, error)
Put sets the value for a specified key.
func (*Replica) RaftStatus ¶
RaftStatus returns the current raft status of the replica. It returns nil if the Raft group has not been initialized yet.
func (*Replica) RangeLookup ¶
func (r *Replica) RangeLookup( ctx context.Context, batch engine.ReadWriter, h roachpb.Header, args roachpb.RangeLookupRequest, ) (roachpb.RangeLookupResponse, []roachpb.Intent, error)
RangeLookup is used to look up RangeDescriptors - a RangeDescriptor is a metadata structure which describes the key range and replica locations of a distinct range in the cluster.
RangeDescriptors are stored as values in the cockroach cluster's key-value store. However, they are always stored using special "Range Metadata keys", which are "ordinary" keys with a special prefix prepended. The Range Metadata Key for an ordinary key can be generated with the `keys.RangeMetaKey(key)` function. The RangeDescriptor for the range which contains a given key can be retrieved by generating its Range Metadata Key and dispatching it to RangeLookup.
Note that the Range Metadata Key sent to RangeLookup is NOT the key at which the desired RangeDescriptor is stored. Instead, this method returns the RangeDescriptor stored at the _lowest_ existing key which is _greater_ than the given key. The returned RangeDescriptor will thus contain the ordinary key which was originally used to generate the Range Metadata Key sent to RangeLookup.
The "Range Metadata Key" for a range is built by appending the end key of the range to the respective meta prefix.
Lookups for range metadata keys usually want to read inconsistently, but some callers need a consistent result; both are supported.
This method has an important optimization in the inconsistent case: instead of just returning the request RangeDescriptor, it also returns a slice of additional range descriptors immediately consecutive to the desired RangeDescriptor. This is intended to serve as a sort of caching pre-fetch, so that the requesting nodes can aggressively cache RangeDescriptors which are likely to be desired by their current workload. The Reverse flag specifies whether descriptors are prefetched in descending or ascending order.
func (*Replica) ReplicaDescriptor ¶
ReplicaDescriptor returns information about the given member of this replica's range.
func (*Replica) ResolveIntent ¶
func (r *Replica) ResolveIntent( ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, h roachpb.Header, args roachpb.ResolveIntentRequest, ) (roachpb.ResolveIntentResponse, error)
ResolveIntent resolves a write intent from the specified key according to the status of the transaction which created it.
func (*Replica) ResolveIntentRange ¶
func (r *Replica) ResolveIntentRange( ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, h roachpb.Header, args roachpb.ResolveIntentRangeRequest, ) (roachpb.ResolveIntentRangeResponse, error)
ResolveIntentRange resolves write intents in the specified key range according to the status of the transaction which created it.
func (*Replica) ReverseScan ¶
func (r *Replica) ReverseScan(ctx context.Context, batch engine.ReadWriter, h roachpb.Header, remScanResults int64, args roachpb.ReverseScanRequest) (roachpb.ReverseScanResponse, []roachpb.Intent, error)
ReverseScan scans the key range specified by start key through end key in descending order up to some maximum number of results. remScanResults stores the number of scan results remaining for this batch (MaxInt64 for no limit).
func (*Replica) Scan ¶
func (r *Replica) Scan(ctx context.Context, batch engine.ReadWriter, h roachpb.Header, remScanResults int64, args roachpb.ScanRequest) (roachpb.ScanResponse, []roachpb.Intent, error)
Scan scans the key range specified by start key through end key in ascending order up to some maximum number of results. remScanResults stores the number of scan results remaining for this batch (MaxInt64 for no limit).
func (*Replica) Send ¶
func (r *Replica) Send(ctx context.Context, ba roachpb.BatchRequest) (*roachpb.BatchResponse, *roachpb.Error)
Send adds a command for execution on this range. The command's affected keys are verified to be contained within the range and the range's leadership is confirmed. The command is then dispatched either along the read-only execution path or the read-write Raft command queue.
func (*Replica) SetMaxBytes ¶
SetMaxBytes atomically sets the maximum byte limit before split. This value is cached by the range for efficiency.
func (*Replica) Snapshot ¶
Snapshot implements the raft.Storage interface. Snapshot requires that the replica lock is held.
func (*Replica) State ¶
func (r *Replica) State() storagebase.RangeInfo
State returns a copy of the internal state of the Replica, along with some auxiliary information.
func (*Replica) String ¶
String returns a string representation of the range. It acquires mu.Lock in the call to Desc().
func (*Replica) Term ¶
Term implements the raft.Storage interface. Term requires that the replica lock is held.
func (*Replica) TruncateLog ¶
func (r *Replica) TruncateLog( ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, h roachpb.Header, args roachpb.TruncateLogRequest, ) (roachpb.TruncateLogResponse, error)
TruncateLog discards a prefix of the raft log. Truncating part of a log that has already been truncated has no effect. If this range is not the one specified within the request body, the request will also be ignored.
func (*Replica) VerifyChecksum ¶
func (r *Replica) VerifyChecksum( ctx context.Context, batch engine.ReadWriter, ms *enginepb.MVCCStats, h roachpb.Header, args roachpb.VerifyChecksumRequest, ) (roachpb.VerifyChecksumResponse, error)
VerifyChecksum verifies the checksum that was computed through a ComputeChecksum request. This command is marked as IsWrite so that it executes on every replica, but it actually doesn't modify the persistent state on the replica.
Raft commands need to consistently execute on all replicas. An error seen on a particular replica should be returned here only if it is guaranteed to be seen on other replicas. In other words, a command needs to be consistent both in success and failure.
type ReplicaDataIterator ¶
type ReplicaDataIterator struct {
// contains filtered or unexported fields
}
ReplicaDataIterator provides a complete iteration over all key / value rows in a range, including all system-local metadata and user data. The ranges keyRange slice specifies the key ranges which comprise all of the range's data.
A ReplicaDataIterator provides a subset of the engine.Iterator interface.
func NewReplicaDataIterator ¶
func NewReplicaDataIterator( d *roachpb.RangeDescriptor, e engine.Reader, replicatedOnly bool, ) *ReplicaDataIterator
NewReplicaDataIterator creates a ReplicaDataIterator for the given replica.
func (*ReplicaDataIterator) Close ¶
func (ri *ReplicaDataIterator) Close()
Close the underlying iterator.
func (*ReplicaDataIterator) Error ¶
func (ri *ReplicaDataIterator) Error() error
Error returns the error, if any, which the iterator encountered.
func (*ReplicaDataIterator) Key ¶
func (ri *ReplicaDataIterator) Key() engine.MVCCKey
Key returns the current key.
func (*ReplicaDataIterator) Next ¶
func (ri *ReplicaDataIterator) Next()
Next advances to the next key in the iteration.
func (*ReplicaDataIterator) Valid ¶
func (ri *ReplicaDataIterator) Valid() bool
Valid returns true if the iterator currently points to a valid value.
func (*ReplicaDataIterator) Value ¶
func (ri *ReplicaDataIterator) Value() []byte
Value returns the current value.
type ReplicaSnapshotDiff ¶
type ReplicaSnapshotDiff struct { // Leader is set to true of this k:v pair is only present on the leader. Leader bool Key roachpb.Key Timestamp hlc.Timestamp Value []byte }
ReplicaSnapshotDiff is a part of a []ReplicaSnapshotDiff which represents a diff between two replica snapshots. For now it's only a diff between their KV pairs.
type Store ¶
type Store struct { Ident roachpb.StoreIdent // contains filtered or unexported fields }
A Store maintains a map of ranges by start key. A Store corresponds to one physical device.
func NewStore ¶
func NewStore(ctx StoreContext, eng engine.Engine, nodeDesc *roachpb.NodeDescriptor) *Store
NewStore returns a new instance of a store.
func (*Store) AcquireRaftSnapshot ¶
AcquireRaftSnapshot returns true if a new raft snapshot can start. If true is returned, the caller MUST call ReleaseRaftSnapshot.
func (*Store) AddReplicaTest ¶
AddReplicaTest adds the replica to the store's replica map and to the sorted replicasByKey slice. To be used only by unittests.
func (*Store) Attrs ¶
func (s *Store) Attrs() roachpb.Attributes
Attrs returns the attributes of the underlying store.
func (*Store) Bootstrap ¶
Bootstrap writes a new store ident to the underlying engine. To ensure that no crufty data already exists in the engine, it scans the engine contents before writing the new store ident. The engine should be completely empty. It returns an error if called on a non-empty engine.
func (*Store) BootstrapRange ¶
BootstrapRange creates the first range in the cluster and manually writes it to the store. Default range addressing records are created for meta1 and meta2. Default configurations for zones are created. All configs are specified for the empty key prefix, meaning they apply to the entire database. The zone requires three replicas with no other specifications. It also adds the range tree and the root node, the first range, to it. The 'initialValues' are written as well after each value's checksum is initialized.
func (*Store) Capacity ¶
func (s *Store) Capacity() (roachpb.StoreCapacity, error)
Capacity returns the capacity of the underlying storage engine. Note that this does not include reservations.
func (*Store) ComputeMetrics ¶
ComputeMetrics immediately computes the current value of store metrics which cannot be computed incrementally. This method should be invoked periodically by a higher-level system which records store metrics.
func (*Store) Descriptor ¶
func (s *Store) Descriptor() (*roachpb.StoreDescriptor, error)
Descriptor returns a StoreDescriptor including current store capacity information.
func (*Store) DrainLeadership ¶
DrainLeadership (when called with 'true') prevents all of the Store's Replicas from acquiring or extending leader leases and waits until all of them have expired. If an error is returned, the draining state is still active, but there may be active leases held by some of the Store's Replicas. When called with 'false', returns to the normal mode of operation.
func (*Store) FrozenStatus ¶
func (s *Store) FrozenStatus(collectFrozen bool) (descs []roachpb.ReplicaDescriptor)
FrozenStatus returns all of the Store's Replicas which are frozen (if the parameter is false) or unfrozen (otherwise). It makes no attempt to prevent new data being rebalanced to the Store, and thus does not guarantee that the Store remains in the reported state.
func (*Store) GetReplica ¶
GetReplica fetches a replica by Range ID. Returns an error if no replica is found.
func (*Store) GossipStore ¶
func (s *Store) GossipStore()
GossipStore broadcasts the store on the gossip network.
func (*Store) IsDrainingLeadership ¶
IsDrainingLeadership accessor.
func (*Store) LookupReplica ¶
LookupReplica looks up a replica via binary search over the "replicasByKey" btree. Returns nil if no replica is found for specified key range. Note that the specified keys are transformed using Key.Address() to ensure we lookup replicas correctly for local keys. When end is nil, a replica that contains start is looked up.
func (*Store) MVCCStats ¶
MVCCStats returns the current MVCCStats accumulated for this store. TODO(mrtracy): This should be removed as part of #4465, this is only needed to support the current StatusSummary structures which will be changing.
func (*Store) MergeRange ¶
func (s *Store) MergeRange(subsumingRng *Replica, updatedEndKey roachpb.RKey, subsumedRangeID roachpb.RangeID) error
MergeRange expands the subsuming range to absorb the subsumed range. This merge operation will fail if the two ranges are not collocated on the same store. Must be called from the processRaft goroutine.
func (*Store) NewRangeDescriptor ¶
func (s *Store) NewRangeDescriptor( start, end roachpb.RKey, replicas []roachpb.ReplicaDescriptor, ) (*roachpb.RangeDescriptor, error)
NewRangeDescriptor creates a new descriptor based on start and end keys and the supplied roachpb.Replicas slice. It allocates new replica IDs to fill out the supplied replicas.
func (*Store) NewSnapshot ¶
NewSnapshot creates a new snapshot engine.
func (*Store) RaftStatus ¶
RaftStatus returns the current raft status of the local replica of the given range.
func (*Store) ReleaseRaftSnapshot ¶
func (s *Store) ReleaseRaftSnapshot()
ReleaseRaftSnapshot decrements the count of active snapshots.
func (*Store) RemoveReplica ¶
RemoveReplica removes the replica from the store's replica map and from the sorted replicasByKey btree. The version of the replica descriptor that was used to make the removal decision is passed in, and the removal is aborted if the replica ID has changed since then. If `destroy` is true, all data beloing to the replica will be deleted. In either case a tombstone record will be written.
func (*Store) ReplicaCount ¶
ReplicaCount returns the number of replicas contained by this store.
func (*Store) ReplicaDescriptor ¶
func (s *Store) ReplicaDescriptor(groupID roachpb.RangeID, replicaID roachpb.ReplicaID) (roachpb.ReplicaDescriptor, error)
ReplicaDescriptor returns the replica descriptor for the given range and replica, if known.
func (*Store) Reserve ¶
func (s *Store) Reserve(req roachpb.ReservationRequest) roachpb.ReservationResponse
Reserve requests a reservation from the store's bookie.
func (*Store) Send ¶
func (s *Store) Send(ctx context.Context, ba roachpb.BatchRequest) (br *roachpb.BatchResponse, pErr *roachpb.Error)
Send fetches a range based on the header's replica, assembles method, args & reply into a Raft Cmd struct and executes the command using the fetched range. An incoming request may be transactional or not. If it is not transactional, the timestamp at which it executes may be higher than that optionally specified through the incoming BatchRequest, and it is not guaranteed that all operations are written at the same timestamp. If it is transactional, a timestamp must not be set - it is deduced automatically from the transaction. Should a transactional operation be forced to a higher timestamp (for instance due to the timestamp cache), the response will have a transaction set which should be used to update the client transaction.
func (*Store) SetRangeRetryOptions ¶
SetRangeRetryOptions sets the retry options used for this store. For unittests only. TODO(bdarnell): have the affected tests pass retry options in through the StoreContext.
func (*Store) SplitRange ¶
SplitRange shortens the original range to accommodate the new range. The new range is added to the ranges map and the rangesByKey btree.
func (*Store) TestingKnobs ¶
func (s *Store) TestingKnobs() *StoreTestingKnobs
TestingKnobs accessor.
func (*Store) WaitForInit ¶
func (s *Store) WaitForInit()
WaitForInit waits for any asynchronous processes begun in Start() to complete their initialization. In particular, this includes gossiping. In some cases this may block until the range GC queue has completed its scan. Only for testing.
type StoreContext ¶
type StoreContext struct { Clock *hlc.Clock DB *client.DB Gossip *gossip.Gossip StorePool *StorePool Transport *RaftTransport // SQLExecutor is used by the store to execute SQL statements in a way that // is more direct than using a sql.Executor. SQLExecutor sqlutil.InternalExecutor // RangeRetryOptions are the retry options when retryable errors are // encountered sending commands to ranges. RangeRetryOptions retry.Options // RaftTickInterval is the resolution of the Raft timer; other raft timeouts // are defined in terms of multiples of this value. RaftTickInterval time.Duration // RaftHeartbeatIntervalTicks is the number of ticks that pass between heartbeats. RaftHeartbeatIntervalTicks int // RaftElectionTimeoutTicks is the number of ticks that must pass before a follower // considers a leader to have failed and calls a new election. Should be significantly // higher than RaftHeartbeatIntervalTicks. The raft paper recommends a value of 150ms // for local networks. RaftElectionTimeoutTicks int // ScanInterval is the default value for the scan interval ScanInterval time.Duration // ScanMaxIdleTime is the maximum time the scanner will be idle between ranges. // If enabled (> 0), the scanner may complete in less than ScanInterval for small // stores. ScanMaxIdleTime time.Duration // ConsistencyCheckInterval is the default time period in between consecutive // consistency checks on a range. ConsistencyCheckInterval time.Duration // ConsistencyCheckPanicOnFailure causes the node to panic when it detects a // replication consistency check failure. ConsistencyCheckPanicOnFailure bool // AllocatorOptions configures how the store will attempt to rebalance its // replicas to other stores. AllocatorOptions AllocatorOptions // Tracer is a request tracer. Tracer opentracing.Tracer // If LogRangeEvents is true, major changes to ranges will be logged into // the range event log. LogRangeEvents bool // BlockingSnapshotDuration is the amount of time Replica.Snapshot // will wait before switching to asynchronous mode. Zero is a good // choice for production but non-zero values can speed up tests. // (This only blocks on the first attempt; it will not block a // second time if the generation is still in progress). BlockingSnapshotDuration time.Duration // AsyncSnapshotMaxAge is the maximum amount of time that an // asynchronous snapshot will be held while waiting for raft to pick // it up (counted from when the snapshot generation is completed). AsyncSnapshotMaxAge time.Duration TestingKnobs StoreTestingKnobs // contains filtered or unexported fields }
A StoreContext encompasses the auxiliary objects and configuration required to create a store. All fields holding a pointer or an interface are required to create a store; the rest will have sane defaults set if omitted.
func TestStoreContext ¶
func TestStoreContext() StoreContext
TestStoreContext has some fields initialized with values relevant in tests.
func (*StoreContext) Valid ¶
func (sc *StoreContext) Valid() bool
Valid returns true if the StoreContext is populated correctly. We don't check for Gossip and DB since some of our tests pass that as nil.
type StoreList ¶
type StoreList struct {
// contains filtered or unexported fields
}
StoreList holds a list of store descriptors and associated count and used stats for those stores.
type StorePool ¶
type StorePool struct {
// contains filtered or unexported fields
}
StorePool maintains a list of all known stores in the cluster and information on their health.
type StoreTestingKnobs ¶
type StoreTestingKnobs struct { // A callback to be called when executing every replica command. // If your filter is not idempotent, consider wrapping it in a // ReplayProtectionFilterWrapper. TestingCommandFilter storagebase.ReplicaCommandFilter // A callback to be called instead of panicking due to a // checksum mismatch in VerifyChecksum() BadChecksumPanic func([]ReplicaSnapshotDiff) // Disables the use of one phase commits. DisableOnePhaseCommits bool // A hack to manipulate the clock before sending a batch request to a replica. // TODO(kaneda): This hook is not encouraged to use. Get rid of it once // we make TestServer take a ManualClock. ClockBeforeSend func(*hlc.Clock, roachpb.BatchRequest) }
StoreTestingKnobs is a part of the context used to control parts of the system.
func (*StoreTestingKnobs) ModuleTestingKnobs ¶
func (*StoreTestingKnobs) ModuleTestingKnobs()
ModuleTestingKnobs is part of the base.ModuleTestingKnobs interface.
type Stores ¶
type Stores struct {
// contains filtered or unexported fields
}
A Stores provides methods to access a collection of stores. There's a visitor pattern and also an implementation of the client.Sender interface which directs a call to the appropriate store based on the call's key range. Stores also implements the gossip.Storage interface, which allows gossip bootstrap information to be persisted consistently to every store and the most recent bootstrap information to be read at node startup.
func NewStores ¶
NewStores returns a local-only sender which directly accesses a collection of stores.
func (*Stores) FirstRange ¶
func (ls *Stores) FirstRange() (*roachpb.RangeDescriptor, error)
FirstRange implements the RangeDescriptorDB interface. It returns the range descriptor which contains KeyMin.
func (*Stores) GetStoreCount ¶
GetStoreCount returns the number of stores this node is exporting.
func (*Stores) RangeLookup ¶
func (ls *Stores) RangeLookup( key roachpb.RKey, _ *roachpb.RangeDescriptor, considerIntents, useReverseScan bool, ) ([]roachpb.RangeDescriptor, []roachpb.RangeDescriptor, *roachpb.Error)
RangeLookup implements the RangeDescriptorDB interface. It looks up the descriptors for the given (meta) key.
func (*Stores) ReadBootstrapInfo ¶
func (ls *Stores) ReadBootstrapInfo(bi *gossip.BootstrapInfo) error
ReadBootstrapInfo implements the gossip.Storage interface. Read attempts to read gossip bootstrap info from every known store and finds the most recent from all stores to initialize the bootstrap info argument. Returns an error on any issues reading data for the stores (but excluding the case in which no data has been persisted yet).
func (*Stores) RemoveStore ¶
RemoveStore removes the specified store from the store map.
func (*Stores) Send ¶
func (ls *Stores) Send(ctx context.Context, ba roachpb.BatchRequest) (*roachpb.BatchResponse, *roachpb.Error)
Send implements the client.Sender interface. The store is looked up from the store map if specified by the request; otherwise, the command is being executed locally, and the replica is determined via lookup through each store's LookupRange method. The latter path is taken only by unit tests.
func (*Stores) VisitStores ¶
VisitStores implements a visitor pattern over stores in the storeMap. The specified function is invoked with each store in turn. Stores are visited in a random order.
func (*Stores) WriteBootstrapInfo ¶
func (ls *Stores) WriteBootstrapInfo(bi *gossip.BootstrapInfo) error
WriteBootstrapInfo implements the gossip.Storage interface. Write persists the supplied bootstrap info to every known store. Returns nil on success; otherwise returns first error encountered writing to the stores.
Source Files ¶
- abort_cache.go
- addressing.go
- allocator.go
- balancer.go
- command_queue.go
- doc.go
- gc_queue.go
- id_alloc.go
- intent_resolver.go
- log.go
- queue.go
- raft.go
- raft.pb.go
- raft_log_queue.go
- raft_transport.go
- range_tree.go
- rangetree.pb.go
- replica.go
- replica_command.go
- replica_consistency_queue.go
- replica_data_iter.go
- replica_gc_queue.go
- replica_raftstorage.go
- replica_range_lease.go
- replica_state.go
- replicate_queue.go
- reservation.go
- scanner.go
- split_queue.go
- stats.go
- store.go
- store_pool.go
- stores.go
- timestamp_cache.go
- track_raft_protos.go
- verify_queue.go
Directories ¶
Path | Synopsis |
---|---|
Package engine provides low-level storage.
|
Package engine provides low-level storage. |
enginepb
Package enginepb is a generated protocol buffer package.
|
Package enginepb is a generated protocol buffer package. |
Package storagebase is a generated protocol buffer package.
|
Package storagebase is a generated protocol buffer package. |