KEMBAR78
Comparing 9d418600f4...3612c2334a · git/git · GitHub
Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: git/git
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: 9d418600f4
Choose a base ref
...
head repository: git/git
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: 3612c2334a
Choose a head ref
  • 12 commits
  • 8 files changed
  • 1 contributor

Commits on Jun 11, 2019

  1. repack: refactor pack deletion for future use

    The repack builtin deletes redundant pack-files and their
    associated .idx, .promisor, .bitmap, and .keep files. We will want
    to re-use this logic in the future for other types of repack, so
    pull the logic into 'unlink_pack_path()' in packfile.c.
    
    The 'ignore_keep' parameter is enabled for the use in repack, but
    will be important for a future caller.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jun 11, 2019
    Configuration menu
    Copy the full SHA
    8434e85 View commit details
    Browse the repository at this point in the history
  2. Docs: rearrange subcommands for multi-pack-index

    We will add new subcommands to the multi-pack-index, and that will
    make the documentation a bit messier. Clean up the 'verb'
    descriptions by renaming the concept to 'subcommand' and removing
    the reference to the object directory.
    
    Helped-by: Stefan Beller <sbeller@google.com>
    Helped-by: Szeder Gábor <szeder.dev@gmail.com>
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jun 11, 2019
    Configuration menu
    Copy the full SHA
    81efa16 View commit details
    Browse the repository at this point in the history
  3. multi-pack-index: prepare for 'expire' subcommand

    The multi-pack-index tracks objects in a collection of pack-files.
    Only one copy of each object is indexed, using the modified time
    of the pack-files to determine tie-breakers. It is possible to
    have a pack-file with no referenced objects because all objects
    have a duplicate in a newer pack-file.
    
    Introduce a new 'expire' subcommand to the multi-pack-index builtin.
    This subcommand will delete these unused pack-files and rewrite the
    multi-pack-index to no longer refer to those files. More details
    about the specifics will follow as the method is implemented.
    
    Add a test that verifies the 'expire' subcommand is correctly wired,
    but will still be valid when the verb is implemented. Specifically,
    create a set of packs that should all have referenced objects and
    should not be removed during an 'expire' operation. The packs are
    created carefully to ensure they have a specific order when sorted
    by size. This will be important in a later test.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jun 11, 2019
    Configuration menu
    Copy the full SHA
    cff9711 View commit details
    Browse the repository at this point in the history
  4. midx: simplify computation of pack name lengths

    Before writing the multi-pack-index, we compute the length of the
    pack-index names concatenated together. This forms the data in the
    pack name chunk, and we precompute it to compute chunk offsets.
    The value is also modified to fit alignment needs.
    
    Previously, this computation was coupled with adding packs from
    the existing multi-pack-index and the remaining packs in the object
    dir not already covered by the multi-pack-index.
    
    In anticipation of this becoming more complicated with the 'expire'
    subcommand, simplify the computation by centralizing it to a single
    loop before writing the file.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jun 11, 2019
    Configuration menu
    Copy the full SHA
    dba6175 View commit details
    Browse the repository at this point in the history
  5. midx: refactor permutation logic and pack sorting

    In anticipation of the expire subcommand, refactor the way we sort
    the packfiles by name. This will greatly simplify our approach to
    dropping expired packs from the list.
    
    First, create 'struct pack_info' to replace 'struct pack_pair'.
    This struct contains the necessary information about a pack,
    including its name, a pointer to its packfile struct (if not
    already in the multi-pack-index), and the original pack-int-id.
    
    Second, track the pack information using an array of pack_info
    structs in the pack_list struct. This simplifies the logic around
    the multiple arrays we were tracking in that struct.
    
    Finally, update get_sorted_entries() to not permute the pack-int-id
    and instead supply the permutation to write_midx_object_offsets().
    This requires sorting the packs after get_sorted_entries().
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jun 11, 2019
    Configuration menu
    Copy the full SHA
    d01bf2e View commit details
    Browse the repository at this point in the history
  6. multi-pack-index: implement 'expire' subcommand

    The 'git multi-pack-index expire' subcommand looks at the existing
    mult-pack-index, counts the number of objects referenced in each
    pack-file, deletes the pack-fils with no referenced objects, and
    rewrites the multi-pack-index to no longer reference those packs.
    
    Refactor the write_midx_file() method to call write_midx_internal()
    which now takes an existing 'struct multi_pack_index' and a list
    of pack-files to drop (as specified by the names of their pack-
    indexes). As we write the new multi-pack-index, we drop those
    file names from the list of known pack-files.
    
    The expire_midx_packs() method removes the unreferenced pack-files
    after carefully closing the packs to avoid open handles.
    
    Test that a new pack-file that covers the contents of two other
    pack-files leads to those pack-files being deleted during the
    expire subcommand. Be sure to read the multi-pack-index to ensure
    it no longer references those packs.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jun 11, 2019
    Configuration menu
    Copy the full SHA
    19575c7 View commit details
    Browse the repository at this point in the history
  7. multi-pack-index: prepare 'repack' subcommand

    In an environment where the multi-pack-index is useful, it is due
    to many pack-files and an inability to repack the object store
    into a single pack-file. However, it is likely that many of these
    pack-files are rather small, and could be repacked into a slightly
    larger pack-file without too much effort. It may also be important
    to ensure the object store is highly available and the repack
    operation does not interrupt concurrent git commands.
    
    Introduce a 'repack' subcommand to 'git multi-pack-index' that
    takes a '--batch-size' option. The subcommand will inspect the
    multi-pack-index for referenced pack-files whose size is smaller
    than the batch size, until collecting a list of pack-files whose
    sizes sum to larger than the batch size. Then, a new pack-file
    will be created containing the objects from those pack-files that
    are referenced by the multi-pack-index. The resulting pack is
    likely to actually be smaller than the batch size due to
    compression and the fact that there may be objects in the pack-
    files that have duplicate copies in other pack-files.
    
    The current change introduces the command-line arguments, and we
    add a test that ensures we parse these options properly. Since
    we specify a small batch size, we will guarantee that future
    implementations do not change the list of pack-files.
    
    In addition, we hard-code the modified times of the packs in
    the pack directory to ensure the list of packs sorted by modified
    time matches the order if sorted by size (ascending). This will
    be important in a future test.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jun 11, 2019
    Configuration menu
    Copy the full SHA
    2af890b View commit details
    Browse the repository at this point in the history
  8. midx: implement midx_repack()

    To repack with a non-zero batch-size, first sort all pack-files by
    their modified time. Second, walk those pack-files from oldest
    to newest, compute their expected size, and add the packs to a list
    if they are smaller than the given batch-size. Stop when the total
    expected size is at least the batch size.
    
    If the batch size is zero, select all packs in the multi-pack-index.
    
    Finally, collect the objects from the multi-pack-index that are in
    the selected packs and send them to 'git pack-objects'. Write a new
    multi-pack-index that includes the new pack.
    
    Using a batch size of zero is very similar to a standard 'git repack'
    command, except that we do not delete the old packs and instead rely
    on the new multi-pack-index to prevent new processes from reading the
    old packs. This does not disrupt other Git processes that are currently
    reading the old packs based on the old multi-pack-index.
    
    While first designing a 'git multi-pack-index repack' operation, I
    started by collecting the batches based on the actual size of the
    objects instead of the size of the pack-files. This allows repacking
    a large pack-file that has very few referencd objects. However, this
    came at a significant cost of parsing pack-files instead of simply
    reading the multi-pack-index and getting the file information for
    the pack-files. The "expected size" version provides similar
    behavior, but could skip a pack-file if the average object size is
    much larger than the actual size of the referenced objects, or
    can create a large pack if the actual size of the referenced objects
    is larger than the expected size.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jun 11, 2019
    Configuration menu
    Copy the full SHA
    ce1e4a1 View commit details
    Browse the repository at this point in the history
  9. multi-pack-index: test expire while adding packs

    During development of the multi-pack-index expire subcommand, a
    version went out that improperly computed the pack order if a new
    pack was introduced while other packs were being removed. Part of
    the subtlety of the bug involved the new pack being placed before
    other packs that already existed in the multi-pack-index.
    
    Add a test to t5319-multi-pack-index.sh that catches this issue.
    The test adds new packs that cause another pack to be expired, and
    creates new packs that are lexicographically sorted before and
    after the existing packs.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jun 11, 2019
    Configuration menu
    Copy the full SHA
    d274331 View commit details
    Browse the repository at this point in the history
  10. midx: add test that 'expire' respects .keep files

    The 'git multi-pack-index expire' subcommand may delete packs that
    are not needed from the perspective of the multi-pack-index. If
    a pack has a .keep file, then we should not delete that pack. Add
    a test that ensures we preserve a pack that would otherwise be
    expired. First, create a new pack that contains every object in
    the repo, then add it to the multi-pack-index. Then create a .keep
    file for a pack starting with "a-pack" that was added in the
    previous test. Finally, expire and verify that the pack remains
    and the other packs were expired.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jun 11, 2019
    Configuration menu
    Copy the full SHA
    10bfa3f View commit details
    Browse the repository at this point in the history
  11. t5319-multi-pack-index.sh: test batch size zero

    The 'git multi-pack-index repack' command can take a batch size of
    zero, which creates a new pack-file containing all objects in the
    multi-pack-index. The first 'repack' command will create one new
    pack-file, and an 'expire' command after that will delete the old
    pack-files, as they no longer contain any referenced objects in the
    multi-pack-index.
    
    We must remove the .keep file that was added in the previous test
    in order to expire that pack-file.
    
    Also test that a 'repack' will do nothing if there is only one
    pack-file.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jun 11, 2019
    Configuration menu
    Copy the full SHA
    b526d8c View commit details
    Browse the repository at this point in the history

Commits on Jul 1, 2019

  1. t5319: use 'test-tool path-utils' instead of 'ls -l'

    Using 'ls -l' and parsing the columns to find file sizes is
    problematic when the platform could report the owner as a name
    with spaces. Instead, use the 'test-tool path-utils file-size'
    command to list only the sizes.
    
    Reported-by: Johannes Sixt <j6t@kdbg.org>
    Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    derrickstolee authored and gitster committed Jul 1, 2019
    Configuration menu
    Copy the full SHA
    3612c23 View commit details
    Browse the repository at this point in the history
Loading