Thursday, October 25, 2012

PowerShell HowTo: Creating and deleting crawl rules

The last posts covered creating and modifying content sources, now it's time to fine tune the actual crawling. Crawl rules specify which content will be crawled and how.

Setting the rules...

Let's say you want to crawl all URLs which contain the directory pages/. An appropriate crawl rule would look like this:

It's an inclusion rule and the path is http://*pages/*. So all URLs matching this pattern will be crawled.

...in code

Here's the code to create such a crawl rule:
  $Path = "http://*pages/*"
  $SearchApp = Get-SPEnterpriseSearchServiceApplication
  # check if crawl rule already exists; if yes: delete
  if ((Get-SPEnterpriseSearchCrawlRule -SearchApplication $SearchApp -Identity $Path -EA SilentlyContinue)) 
  {
    # remove crawl rule; "-confirm:$false" disables confirmation dialog which would otherwise pop up
    Remove-SPEnterpriseSearchCrawlRule -SearchApplication $SearchApp -Identity $Path1 -confirm:$false
  }

  $Rule = New-SPEnterpriseSearchCrawlRule -SearchApplication $SearchApp -Path $Path -Type InclusionRule -CrawlAsHttp 0 -FollowComplexUrls 0
  $Rule.CaseSensitiveURL = 1
  $Rule.Update()

This code first checks if the crawl rule already exists and deletes it if it does. (You could also display an error message instead.) Then the crawl rule is created.

Some parameters can only be specified when creating the rule with New-SPEnterpriseSearchCrawlRule (like Type), some can only be set afterwards on the existing rule (like CaseSensitiveUrl).

No comments:

Post a Comment