Talking about your options:
1) my concern here is that we would have 2 repos, with parallel lifecycles, that are not enforced to stay aligned. A change in a dissector would benefit from a test case, but such a testcase in happy-shark would be proposed after the code merge in the main repo. That would slow down the process, wouldn't it?
2) this is the current situation. Ideal in the sense that a change carries the code and the testcase. Suboptimal because as soon as the testcases grow, the repo gets too heavy, as you said.
If the concern is not to make the repo too heavy we may investigate other options as well.
1) use git submodules
2) use git lfs
Option 2 sounds promising: "Git Large File Storage (LFS) replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git". We do have a dataset. Moreover
gitlab.com supports LFS.
Unfortunately I don't have direct experience with either submodules and lfs, hence I cannot provide more than just raw ideas.
--