Semgrep analyzer
The Semgrep analyzer performs SAST scanning on repositories containing code written in several languages:
- C# (.NET)
- C
- C++
- Go
- Kotlin
- Java
- JavaScript
- Objective-C
- Python
- Ruby
- Scala
- Swift
- TypeScript
The analyzer wraps Semgrep, and is written in Go. It's structured similarly to other Static Analysis analyzers because it uses the shared command package.
The analyzer is built and published as a Docker image in the GitLab Container Registry associated with this repository. You would typically use this analyzer in the context of a SAST job in your CI/CD pipeline. However, if you're contributing to the analyzer or you need to debug a problem, you can run, debug, and test locally using Docker.
For instructions on local development, please refer to the README in Analyzer Scripts.
SAST Rules
The sast-rules
repository is the source of truth for the GitLab Semgrep rulesets. Changes to rules should be made in sast-rules
. A CI job is responsible for validating and publishing the latest rules, which will eventually be consumed by the Semgrep analyzer here.
Versioning and release process
Please check the versioning and release process documentation.
Failing Integation Tests
Whenever SAST_RULES_VERSION
in Dockerfile
and Dockerfile.fips
gets
changed, it is likely that the corresponding integration tests are failing.
This project leverages the
integration-test
tool to perform integration tests; it uses RSpec to
implement the integration test logic. In essence, integration-test
launches
the semgrep docker image and runs the ./spec/semgrep_image_spec.rb
file that
runs semgrep (within a Docker container) on a set of test files located in
qa/fixtures
, generates reports and compares them against the test
expectations ins qa/expect
. The output of integration-test
can be very
verbose even for small deviations from the expectations. In the presence of an
error, it is advised to scroll to the bottom of the job log where you find a
concise description of the failing test cases.
In addition to the log analysis, it can be very helpful to generate the test
expectations to understand the impact of rule changes. For regenerating test
expectations, you can use the script analyzer-refresh-expected-json where TMP_IMAGE
can be
replaced with the URL that points to the latest image that was produced in the
build tmp image
job. The environment variable setting REFRESH_EXPECTED=true
instructs the integration-test
tool to regenerate the expectations so that
they pass the integration test jobs.
After running the script analyzer-refresh-expected-json in the git root directory of semgrep,
you should see the updated test expectation files in your repository. You can
then use standard tools such as git diff
to better understand the changes. In
the diff you may come across :SKIP:
placeholders which are used to ignore
certain fields from the baseline comparison. It is advised to use them for
fields that are irrelevant with regards to the integration test such as the
id
, version
, start_time
and end_time
fields for vulnerabilites.
Contributing
Contributions are welcome, see CONTRIBUTING.md
for more details.
License
This code is distributed under the MIT Expat license, see the LICENSE file.