Git Learning - WIP

Git 学习 · 楔子

最近发现对于 Git 的理解还不够深入
重新学习一下 也稍作记录吧

当然也会有
深入着深入着就发现学不动的情况(:з」∠)

Chapter 1 关于 Git

版本控制系统

版本控制系统 VCS(Version Control System) 目的是方便管理不同的版本

It allows you to revert selected files back to a previous state, revert the entire project back to a previous state, compare changes over time, see who last modified something that might be causing a problem, who introduced an issue and when, and more. Using a VCS also generally means that if you screw things up or lose files, you can easily recover. In addition, you get all this for very little overhead.

最原始的版本控制比如 附上时间的 word 文档命名
但显然这也有困难的时候 -> 每次更改后对于作者而言 并不是那么容易弄清楚究竟哪部分被修改了 同时当版本控制推广到多个文件时 时间戳的方式完全不实用(或者说变得异常复杂了)

最简单的方式就是本地实现一个数据库

前人们也做过不同的努力 比如 RCS 和 SVN 等等
而程序员的工作要求不同的工程师共同合作 因此 CVCS(Centralized Version Control Systems) 成了一个早期的 solution

这种 Server-Client 模型相对而言比在 Client 上存放 local DB 显然更有优势
但当然也有一些不好的地方, 比如 Server 的单点故障 整个系统就 gg 了 (Local VCS 一样)

而 DVCS(Distributed Version Control Systems) 就有其优越性了 (除了Git 实际上还有 Mercurial, Bazaar or Darcs, 但这也是个寡头市场🐶)

In a DVCS, clients don’t just check out the latest snapshot of the files; rather, they fully mirror the repository, including its full history. Thus, if any server dies, and these systems were collaborating via that server, any of the client repositories can be copied back up to the server to restore it. Every clone is really a full backup of all the data.

这个思路其实和区块链技术是异曲同工的! 所有人都做完全的备份和记录 尽管有些时候显得冗余(毕竟硬盘不值钱) 但确实是最合适的方式

顺带提一句 Git 的诞生也是源于 Linux 开源社区
没啥别的可说 Respect

Git 是什么

作者提到理解 Git 的核心思路或者说理念是非常重要的 大概这也是 Git 现在一枝独秀的原因

These other systems (CVS, Subversion, Perforce, Bazaar, and so on) think of the information they store as a set of files and the changes made to each file over time (this is commonly described as delta-based version control).


从上图显然我们可以看出这是一个 增量修改的记录

然而 Git 的思路 则不同 这是一个 snapshot(快照) 的形式

Git doesn’t think of or store its data this way. Instead, Git thinks of its data more like a series of snapshots of a miniature filesystem. With Git, every time you commit, or save the state of your project, Git basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot. To be efficient, if files have not changed, Git doesn’t store the file again, just a link to the previous identical file it has already stored. Git thinks about its data more like a stream of snapshots.

这种不同的思路赋予了更多可能性 比如本地修改 提交等等 在 CVCS 状态下受限的许多功能
而任何修改都没办法逃过 Git 的眼睛 最终所有 change 都会以 SHA-1 的 hash value 记录

一般来说 Git 只会添加内容 而不是删除 这也使得开发人员易于测试

Git 三状态

Modified means that you have changed the file but have not committed it to your database yet.
Staged means that you have marked a modified file in its current version to go into your next commit snapshot.
Committed means that the data is safely stored in your local database.
常用 Git 的人对这个应该很熟悉了

CLI / UI / Others

显然码农是该用 CLI 的
如果看到此处仍有疑惑 或者不知道该怎么运用 Git CLI, 作者在此处建议读者先学习一些计算机基础🐶
另外本书版本基于 Git 2.8.0 但应该已经囊括绝大部分 feature 了

Mac 用户应该直接 brew install git 就好了

Git Config Setup

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
git config help # 使用帮助

# 展示信息
git config -l

git --version
# git version 2.23.0

# ~/Users/*_*
[user]
email = monster.cmu@gmail.com
name = monster
[filter "lfs"]
required = true
clean = git-lfs clean -- %f
smudge = git-lfs smudge -- %f
process = git-lfs filter-process

# 看了书可以引入的部分?
color.status=auto
color.branch=auto
color.interactive=auto
color.diff=auto

Chapter 2 Git 基础

本章节感觉偏基础 细节就不详述了 可以自行学习了解

Get Git Repostory

1
2
3
4
5
6
7
# 你可以
mkdir myGit
cd myGit
git init # 本地新建 Git 仓库

# 或者
git clone https://github.com/libgit2/libgit2 mylibgit # 克隆远程仓库, 可能有权限的问题

文件状态详解及常见命令

私以为这个比三状态讲的好些其实

Remember that each file in your working directory can be in one of two states: tracked or untracked. Tracked files are files that were in the last snapshot; they can be unmodified, modified, or staged. In short, tracked files are files that Git knows about.

常用命令包括但不限于以下的 list 先列出来后面加上必要的说明
我有个问题是太爱用 oh-my-zsh下的缩写了比如 gss, gcam
用到后面其实多少会发现有些 miss 的地方

1
2
3
4
5
6
7
8
9
10
11
12
git status
git add
git commit
git diff --cached # 对我很有用大概🤣

# example
$ git status -s
M README
MM Rakefile
A lib/git.rb
M lib/simplegit.rb
?? LICENSE.txt

New files that aren’t tracked have a ?? next to them, new files that have been added to the staging area have an A, modified files have an M and so on. There are two columns to the output — the left-hand column indicates the status of the staging area and the right-hand column indicates the status of the working tree. So for example in that output, the README file is modified in the working directory but not yet staged, while the lib/simplegit.rb file is modified and staged. The Rakefile was modified, staged and then modified again, so there are changes to it that are both staged and unstaged.

.gitignore 是个好东西 默认可以忽略一些不必要的文件
基本语法是 Regex 的方式, 更多详情参考 https://github.com/github/gitignore

git rm 的不同 case

  • 直接移除不想 track 的文件
  • staged 了的修改文件但其实并不需要
    1
    2
    3
    4
    5
    6
    7
    git rm --cached README

    # Note the backslash (\) in front of the *. This is necessary because Git does its own filename expansion in addition to your shell’s filename expansion. This command removes all files that have the .log extension in the log/ directory.
    git rm log/\*.log

    # This command removes all files whose names end with a ~.
    git rm \*~

查看提交历史

git log aka glg 🐶
Useful options for git log --pretty=format
这个感觉可以玩的花样太多了 坦白说一时用不上 真的有必要可能用 GUI 查看明显点
而对于一般的简单 project, graph 真的毛线看不出来

1
2
3
4
5
6
7
8
9
10
11
$ git log --pretty=format:"%h %s" --graph
* 2d3acf9 ignore errors from SIGCHLD on trap
* 5e3ee11 Merge branch 'master' of git://github.com/dustin/grit
|\
| * 420eac9 Added a method for getting the current branch.
* | 30e367c timeout code and tests
* | 5a09431 add timeout protection to grit
* | e1193f8 support for heads with slashes in them
|/
* d6016bc require time for xmlschema
* 11d191e Merge branch 'defunkt' into local

Undo

慎用 一定要很明白自己在搞什么飞机才行

Be careful, because you can’t always undo some of these undos. This is one of the few areas in Git where you may lose some work if you do it wrong.

常见的比如 git commit --amend
但实际上我更常用的怕不是 git rebase -i 详情可以参考当年的一片文章 LOL
根据 Git 记录我显然是写过的但 unfortunately 这篇文章真的暂时属于丢失状态=.= 确实是有点出乎意料了
后面我看看怎么解决这个问题

问题真的是越搞越多 暂时没有好的 solution😂

完整的 git status 其实有很多信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
On branch master
Your branch is ahead of 'origin/master' by 1 commit.
(use "git push" to publish your local commits)

Changes to be committed:
(use "git restore --staged <file>..." to unstage)
deleted: db.json

Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: source/_posts/Git-Learning.md

# 比如书里说法是 git reset HEAD <file>... to unstage. 这里的 restore 大概是语法糖吧

It’s true that git reset can be a dangerous command, especially if you provide the –hard flag. However, in the scenario described above, the file in your working directory is not touched, so it’s relatively safe.
坦白说我在调试整理的时候就是 --hard 了, 感觉这个在非必要情况还是该少用 幺蛾子很多🤣

It’s important to understand that git checkout – is a dangerous command. Any local changes you made to that file are gone — Git just replaced that file with the most recently-committed version. Don’t ever use this command unless you absolutely know that you don’t want those unsaved local changes.
友情提示 again 🙂

远程协作

git remote -v 查看详细内容
其实我有点 confused 的点在于 如果同时存在多个不同版本的 remote resource, 这难道不会使得开发工作陷入更多的混乱吗?
可能复杂情况下确实有这种需求 但我感觉 remote resource 的内容都是同步的才好 (当然有个可能是不同 remote resource 对应不同的 branch)

git remote show origin 则更详细解释对应的 remote source, 例子如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
git remote show origin
* remote origin
URL: https://github.com/my-org/complex-project
Fetch URL: https://github.com/my-org/complex-project
Push URL: https://github.com/my-org/complex-project
HEAD branch: master
Remote branches:
master tracked
dev-branch tracked
markdown-strip tracked
issue-43 new (next fetch will store in remotes/origin)
issue-45 new (next fetch will store in remotes/origin)
refs/remotes/origin/issue-11 stale (use 'git remote prune' to remove)
Local branches configured for 'git pull':
dev-branch merges with remote dev-branch
master merges with remote master
Local refs configured for 'git push':
dev-branch pushes to dev-branch (up to date)
markdown-strip pushes to markdown-strip (up to date)
master pushes to master (up to date)

git fetch 只 fetch 新的内容, 并不做 merge
git pull 几乎可以视作 fetch + merge , 前提是目前的 branch track 了一个远程的 branch (否则不知道怎么匹配并 merge)
git push 把本地的更新 push 到远程, 显然需要写权限 -> 除非是自己的 project, 公司的协作里基本还是要过 code review 才好

标签🤣

这个 feature 怎么说 是很好了 hhhhhh 不过我确实没怎么用过(:з」∠)

git tag -l "v1.8.5*" -> wildcard matching 对应不同的 tag, 示例为 v1.8.5

有两种类型: lightweightannotated

A lightweight tag is very much like a branch that doesn’t change — it’s just a pointer to a specific commit.
Annotated tags, however, are stored as full objects in the Git database. They’re checksummed; contain the tagger name, email, and date; have a tagging message; and can be signed and verified with GNU Privacy Guard (GPG). It’s generally recommended that you create annotated tags so you can have all this information; but if you want a temporary tag or for some reason don’t want to keep the other information, lightweight tags are available too.

  • lightweight
    • 直接 git tag <lightweight tag name> 即可
  • annotated
    • git tag -a <tag name> -m <"message">
    • 还可以给历史上的 commit 加 比如 git tag -a <tag name> <SHA-1 id>

插一句 昨天在知乎看了半天关于 SHA128 的内容
如何评价 2 月 23 日谷歌宣布实现了 SHA-1 碰撞? - 刘巍然-学酥的回答 - 知乎
怎么说呢 理论上确实存在不安全的可能 但对于一般的应用来说完全足够了
想起本科老师学的 密码学的问题并不是在于 理论上能不能攻破 而是考虑到成本(经济 时间) 和密码破译后获得信息价值的一个比较

git push origin --tags 上传 tag 信息, 默认不更改 remote 的 tag 信息
git push origin --delete <tag name>

果然后面用着用着就发现有些奇怪的问题 感觉就是 tag 名称唯一? 不科学吧

1
2
➜  ✗ git tag  show 4402cf6f9734f405a0c4e901ede3545fdacc8cc7
fatal: tag 'show' already exists

-> 或许也是科学的 结合后面的 git checkout <tag name> 这种 feature 来看 感觉这个 tag 就是 hashmap 的样子 但 value 只能是一个 commit 是不是有点蠢?
-> 其实也不尽然 感觉这个就是给一个 key 快速查找对应的 commit, 好好写 commit message 确实也是比较方便的做法(我感觉这是日常工作里大家更多才用的方式 然后通过 git log 来看详细的 description 去找 commit)

alias

看标题大概就是我喜欢的 然而已经有 oh-my-zsh 的 alias 合集了 或许用处并不是那么大了

Chapter 3 Git Branch

这怕不是 Git 的 重中之重 - 所谓的 killer feature
尽管我觉得实际开发中 对 branch 的使用可能还不那么的规范? 或许也跟我参与的项目有关 需要很多人协作的则 branch 的重要性不言而喻
然而现在一般的 project 又都是 micro service, 通常来说不会有太大的代码/分支/开发者量

初级的一些命令和图解我就不赘述了🤣
简单的来说就是 branch 大法好 实际上也是的 由于底层实现机制不同 新分支的成本实际上很低 创建修改等也更迅速
这和其他 VCS 有本质差别 不过我没体验过就是了

分支切换 合并 管理

常见的问题是 switch 时不被允许 一般可以 stash / clean 来解决 本文后面会提到

git merge <tagert branch> -> 这个命令我老是不太习惯 可能就是 top-down 和 bottom-up 的思维方式区别吧


1
2
3
4
5
6
$ git checkout master
Switched to branch 'master'
$ git merge iss53
Merge made by the 'recursive' strategy.
index.html | 1 +
1 file changed, 1 insertion(+)

然而事情总不会这么简单🐶 merge conflict 是常有的事情
git status 就会显示详细的冲突文件 需要手动去对应的文件中修改
一般是 <<<<<<<, =======, 和 >>>>>>> 作为区分符

Branch Workflows

感觉有点偏 best practise 的样子

然而这个在实践中 可能还是仅适用于小作业的 project
大工程量的 project 我感觉几乎都是只有一个 main branch, 依赖于不同 stage 的 deploy 来测试, 而不是利用不同的代码分支
之前有听闻 Google Facebook 也都是这样处理的 甚至会更极端在 source 上没有别的 branch

Remote Branch

看了一个例子 坦白说从我实践的角度来说 似乎不太实用
https://git-scm.com/book/en/v2/Git-Branching-Remote-Branches

打算先略过了

Rebasing

用处似乎不是特别大, 主要是为了 git log 的历史看起来更清晰些
避免分叉而保持了线性历史

总体来说感觉是不错的 feature, 适合在多个 feature 同时开发时搞起
举个例子就是 如果我同时在写好几篇 blog 可能 AWS migration 才写到一半 又从中间切了一个 branch 开始写 Git Learning 那就容易造成一些困扰

1
$ git rebase --onto master server client

以上命令的意思是:“取出 client 分支,找出处于 client 分支和 server 分支的共同祖先之后的修改,然后把它们在 master 分支上重放一遍”。 这理解起来有一点复杂,不过效果非常酷。

以及

Do not rebase commits that exist outside your repository and people may have based work on them.

感觉这个 rebase 还是在多人协作里需要更多的考虑使用 里面提到的一些状况有些乱坦白说
还是该从实践的角度考虑 哪种更加合适
然而更多时候恐怕我们并没有时间去纠结和规范吧哈哈哈哈

One point of view on this is that your repository’s commit history is a record of what actually happened. It’s a historical document, valuable in its own right, and shouldn’t be tampered with. From this angle, changing the commit history is almost blasphemous; you’re lying about what actually transpired. So what if there was a messy series of merge commits? That’s how it happened, and the repository should preserve that for posterity.
The opposing point of view is that the commit history is the story of how your project was made. You wouldn’t publish the first draft of a book, and the manual for how to maintain your software deserves careful editing. This is the camp that uses tools like rebase and filter-branch to tell the story in the way that’s best for future readers.

一句话原则: 只对尚未 push 或尚未分享给别人的本地修改执行 rebase 操作清理历史 从不对已推送至别处的提交执行 rebase 操作

Chapter 4..7+

从目录来看我有点怀疑后面 4+ 的实用性 很多更偏理论和底层实践了
感觉不比深究 就好像你会用冰箱但不是非得明白他的制冷运作原理

服务端

显然我是托管给 AWS CodeCommit 或者 GitHub 了的
基本是讲 server side 的配置过程吧 跳过了

分布式工作流

Centralized Workflow
里面提到了很有意思的一点 尽管 Git 是 DVCS 但是实践中很多应用方式实际上是 CVCS 的模式🤣

Integration-Manager Workflow

The project maintainer pushes to their public repository.
A contributor clones that repository and makes changes.
The contributor pushes to their own public copy.
The contributor sends the maintainer an email asking them to pull changes.
The maintainer adds the contributor’s repository as a remote and merges locally.
The maintainer pushes merged changes to the main repository.

Dictator and Lieutenants Workflow
极其复杂的 project 才会使用这种方式管理 比如 Linux kernel

有段讨论 git commit message 的真的把👴整笑了

Do as we say, not as we do.
For the sake of brevity, many of the examples in this book don’t have nicely-formatted commit messages like this; instead, we simply use the -m option to git commit. In short, do as we say, not as we do.

后面有些讨论感觉也有点复杂啊=.=
https://git-scm.com/book/en/v2/Distributed-Git-Contributing-to-a-Project

GitHub 使用教程

几乎忽略这部分了 hhh

Git Tools

git reflog 可以看到一堆近期记录
其他有些还是有点复杂 主要涉及到多分支的修改吧

git stash 常用的暂存命令
一个非常流行的选项是 stash 命令的 –keep-index 选项。 它告诉 Git 不要储藏任何你通过 git add 命令已暂存的东西。

git clean 则是清除 untracked 的文件, 更安全的方式是 git stash --all

//TODO 7.6

坦白说后面值得大书特书的部分也不多
我尽量挑选一些 实践中比较有用的部分吧

例如 git squash 等 这也是当年实习的时候学到的东西了 lol

谢谢投食 _(:з」∠)_