github.com/gocolly/colly
This package has a good security score with no known vulnerabilities.
Community Reviews
Powerful scraper with convenience over security concerns
The library hasn't been updated since 2019, which is a red flag for dependency supply chain management. It uses older versions of golang.org/x/net and other crypto dependencies that may have unpatched vulnerabilities. TLS configuration requires manual hardening - the defaults don't enforce modern cipher suites or certificate validation patterns I'd want in production. Error handling often exposes full URLs and response details in logs, requiring wrapper code to sanitize output.
Input validation is largely left to the developer. When scraping untrusted sites, you need to implement your own sanitization for extracted data and URL handling. The library doesn't provide secure-by-default patterns for preventing SSRF attacks or handling redirects to internal networks, making it risky for user-supplied URL scenarios without additional safeguards.
Best for: Internal tools scraping trusted, known websites where dependency age is acceptable and security posture is lower risk.
Avoid if: You're processing user-supplied URLs, need modern TLS standards, or require actively maintained dependencies for compliance.
Intuitive web scraping with callback-based API, but showing its age
The type system works well with Go's patterns - collector configuration through functional options is clean, and while there's no generics support (predating Go 1.18), the string-based selectors and interface{} returns are manageable. Error handling could be more granular; you often get generic HTTP errors that require additional debugging. The documentation has solid examples but lacks depth on advanced scenarios like handling complex authentication flows or debugging rate limiting issues.
The library hasn't seen updates since 2019, which shows in missing features like modern context support and some edge cases with JavaScript-heavy sites. Despite this, it remains highly functional for traditional server-rendered HTML scraping and continues to work reliably in production environments.
Best for: Scraping traditional server-rendered HTML sites where you need a clean, maintainable codebase with minimal setup overhead.
Avoid if: You need to scrape JavaScript-heavy SPAs or require active maintenance and modern Go patterns like structured context handling.
Intuitive scraping API with minimal boilerplate, but aging toolchain
The documentation includes practical examples for common scenarios like form submission, authentication, and rate limiting. Request queueing and parallelization work out of the box with sensible defaults. Error messages are generally clear, though network failures sometimes require digging into underlying http.Client errors.
The main drawback is the package hasn't seen updates since 2019, leaving it on older Go module conventions and missing modern context handling patterns. You'll encounter rough edges with Go 1.18+ generics and newer stdlib features. Despite this, the core functionality remains solid for standard scraping tasks, and the codebase is stable enough that lack of updates isn't necessarily problematic for production use.
Best for: Building reliable web scrapers for static HTML sites where you need a clean API with built-in concurrency and rate limiting.
Avoid if: You need to scrape JavaScript-heavy SPAs or require a package with active maintenance and modern Go idioms.
Sign in to write a review
Sign In