universo-virtual.com

buytrendz.net

thisforall.net

benchpressgains.com

qthzb.com

mindhunter9.com

dwjqp1.com

secure-signup.net

ahaayy.com

soxtry.com

tressesindia.com

puresybian.com

krpano-chs.com

cre8workshop.com

hdkino.org

peixun021.com

qz786.com

utahperformingartscenter.org

maw-pr.com

zaaksen.com

ypxsptbfd7.com

worldqrmconference.com

shangyuwh.com

eejssdfsdfdfjsd.com

playminecraftfreeonline.com

trekvietnamtour.com

your-business-articles.com

essaywritingservice10.com

hindusamaaj.com

joggingvideo.com

wandercoups.com

onlinenewsofindia.com

worldgraphic-team.com

bnsrz.com

wormblaster.net

tongchengchuyange0004.com

internetknowing.com

breachurch.com

peachesnginburlesque.com

dataarchitectoo.com

clientfunnelformula.com

30pps.com

cherylroll.com

ks2252.com

webmanicura.com

osostore.com

softsmob.com

sofietsshotel.com

facetorch.com

nylawyerreview.com

apapromotions.com

shareparelli.com

goeaglepointe.com

thegreenmanpubphuket.com

karotorossian.com

publicsensor.com

taiwandefence.com

epcsur.com

odskc.com

inzziln.info

leaiiln.info

cq-oa.com

dqtianshun.com

southstills.com

tvtv98.com

thewellington-hotel.com

bccaipiao.com

colectoresindustrialesgs.com

shenanddcg.com

capriartfilmfestival.com

replicabreitlingsale.com

thaiamarinnewtoncorner.com

gkmcww.com

mbnkbj.com

andrewbrennandesign.com

cod54.com

luobinzhang.com

bartoysdirect.com

taquerialoscompadresdc.com

aaoodln.info

amcckln.info

drvrnln.info

dwabmln.info

fcsjoln.info

hlonxln.info

kcmeiln.info

kplrrln.info

fatcatoons.com

91guoys.com

signupforfreehosting.com

faithfirst.net

zjyc28.com

tongchengjinyeyouyue0004.com

nhuan6.com

oldgardensflowers.com

lightupthefloor.com

bahamamamas-stjohns.com

ly2818.com

905onthebay.com

fonemenu.com

notanothermovie.com

ukrainehighclassescort.com

meincmagazine.com

av-5858.com

yallerdawg.com

donkeythemovie.com

corporatehospitalitygroup.com

boboyy88.com

miteinander-lernen.com

dannayconsulting.com

officialtomsshoesoutletstore.com

forsale-amoxil-amoxicillin.net

generictadalafil-canada.net

guitarlessonseastlondon.com

lesliesrestaurants.com

mattyno9.com

nri-homeloans.com

rtgvisas-qatar.com

salbutamolventolinonline.net

sportsinjuries.info

topsedu.xyz

xmxm7.com

x332.xyz

sportstrainingblog.com

autopartspares.com

readguy.net

soniasegreto.com

bobbygdavis.com

wedsna.com

rgkntk.com

bkkmarketplace.com

zxqcwx.com

breakupprogram.com

boxcardc.com

unblockyoutubeindonesia.com

fabulousbookmark.com

beat-the.com

guatemala-sailfishing-vacations-charters.com

magie-marketing.com

kingstonliteracy.com

guitaraffinity.com

eurelookinggoodapparel.com

howtolosecheekfat.net

marioncma.org

oliviadavismusic.com

shantelcampbellrealestate.com

shopleborn13.com

topindiafree.com

v-visitors.net

qazwsxedcokmijn.com

parabis.net

terriesandelin.com

luxuryhomme.com

studyexpanse.com

ronoom.com

djjky.com

053hh.com

originbluei.com

baucishotel.com

33kkn.com

intrinsiqresearch.com

mariaescort-kiev.com

mymaguk.com

sponsored4u.com

crimsonclass.com

bataillenavale.com

searchtile.com

ze-stribrnych-struh.com

zenithalhype.com

modalpkv.com

bouisset-lafforgue.com

useupload.com

37r.net

autoankauf-muenster.com

bantinbongda.net

bilgius.com

brabustermagazine.com

indigrow.org

miicrosofts.net

mysmiletravel.com

selinasims.com

spellcubesapp.com

usa-faction.com

snn01.com

hope-kelley.com

bancodeprofissionais.com

zjccp99.com

liturgycreator.com

weedsmj.com

majorelenco.com

colcollect.com

androidnews-jp.com

hypoallergenicdogsnames.com

dailyupdatez.com

foodphotographyreviews.com

cricutcom-setup.com

chprowebdesign.com

katyrealty-kanepa.com

tasramar.com

bilgipinari.org

four-am.com

indiarepublicday.com

inquick-enbooks.com

iracmpi.com

kakaschoenen.com

lsm99flash.com

nana1255.com

ngen-niagara.com

technwzs.com

virtualonlinecasino1345.com

wallpapertop.net

nova-click.com

abeautifulcrazylife.com

diggmobile.com

denochemexicana.com

eventhalfkg.com

medcon-taiwan.com

life-himawari.com

myriamshomes.com

nightmarevue.com

allstarsru.com

bestofthebuckeyestate.com

bestofthefirststate.com

bestwireless7.com

declarationintermittent.com

findhereall.com

jingyou888.com

lsm99deal.com

lsm99galaxy.com

moozatech.com

nuagh.com

patliyo.com

philomenamagikz.net

rckouba.net

saturnunipessoallda.com

tallahasseefrolics.com

thematurehardcore.net

totalenvironment-inthatquietearth.com

velislavakaymakanova.com

vermontenergetic.com

sizam-design.com

kakakpintar.com

begorgeouslady.com

1800birks4u.com

2wheelstogo.com

6strip4you.com

bigdata-world.net

emailandco.net

gacapal.com

jharpost.com

krishnaastro.com

lsm99credit.com

mascalzonicampani.com

sitemapxml.org

thecityslums.net

topagh.com

flairnetwebdesign.com

bangkaeair.com

beneventocoupon.com

noternet.org

oqtive.com

smilebrightrx.com

decollage-etiquette.com

1millionbestdownloads.com

7658.info

bidbass.com

devlopworldtech.com

digitalmarketingrajkot.com

fluginfo.net

naqlafshk.com

passion-decouverte.com

playsirius.com

spacceleratorintl.com

stikyballs.com

top10way.com

yokidsyogurt.com

zszyhl.com

16firthcrescent.com

abogadolaboralistamd.com

apk2wap.com

aromacremeria.com

banparacard.com

bosmanraws.com

businessproviderblog.com

caltonosa.com

calvaryrevivalchurch.org

chastenedsoulwithabrokenheart.com

cheminotsgardcevennes.com

cooksspot.com

cqxzpt.com

deesywig.com

deltacartoonmaps.com

despixelsetdeshommes.com

duocoracaobrasileiro.com

fareshopbd.com

goodpainspills.com

kobisitecdn.com

makaigoods.com

mgs1454.com

piccadillyresidences.com

radiolaondafresca.com

rubendorf.com

searchengineimprov.com

sellmyhrvahome.com

shugahouseessentials.com

sonihullquad.com

subtractkilos.com

valeriekelmansky.com

vipasdigitalmarketing.com

voolivrerj.com

zeelonggroup.com

1015southrockhill.com

10x10b.com

111-online-casinos.com

191cb.com

3665arpentunitd.com

aitesonics.com

bag-shokunin.com

brightotech.com

communication-digitale-services.com

covoakland.org

dariaprimapack.com

freefortniteaccountss.com

gatebizglobal.com

global1entertainmentnews.com

greatytene.com

hiroshiwakita.com

iktodaypk.com

jahatsakong.com

meadowbrookgolfgroup.com

newsbharati.net

platinumstudiosdesign.com

slotxogamesplay.com

strikestaruk.com

trucosdefortnite.com

ufabetrune.com

weddedtowhitmore.com

12940brycecanyonunitb.com

1311dietrichoaks.com

2monarchtraceunit303.com

601legendhill.com

850elaine.com

adieusolasomade.com

andora-ke.com

bestslotxogames.com

cannagomcallen.com

endlesslyhot.com

iestpjva.com

ouqprint.com

pwmaplefest.com

qtylmr.com

rb88betting.com

buscadogues.com

1007macfm.com

born-wild.com

growthinvests.com

promocode-casino.com

proyectogalgoargentina.com

wbthompson-art.com

whitemountainwheels.com

7thavehvl.com

developmethis.com

funkydogbowties.com

travelodgegrandjunction.com

gao-town.com

globalmarketsuite.com

blogshippo.com

hdbka.com

proboards67.com

outletonline-michaelkors.com

kalkis-research.com

thuthuatit.net

buckcash.com

hollistercanada.com

docterror.com

asadart.com

vmayke.org

erwincomputers.com

dirimart.org

okkii.com

loteriasdecehegin.com

mountanalog.com

healingtaobritain.com

ttxmonitor.com

bamthemes.com

nwordpress.com

11bolabonanza.com

avgo.top

How to speed up site migrations with AI-powered redirect mapping - SEO
Thursday, June 19, 2025
spot_img

Top 5 This Week

Related Posts

How to speed up site migrations with AI-powered redirect mapping

Migrating a large website is always daunting. Big traffic is at stake among many moving parts, technical challenges and stakeholder management.

Historically, one of the most onerous tasks in a migration plan has been redirect mapping. The painstaking process of matching URLs on your current site to the equivalent version on the new website.

Fortunately, this task that previously could involve teams of people combing through thousands of URLs can be drastically sped up with modern AI models.

Should you use AI for redirect mapping?

The term “AI” has become someone conflated with “ChatGPT” over the last year, so to be very clear from the outset, we are not talking about using generative AI/LLM-based systems to do your redirect mapping. 

While there are some tasks that tools like ChatGPT can assist you with, such as writing that tricky regex for the redirect logic, the generative element that can cause hallucinations could potentially create accuracy issues for us.

Advantages of using AI for redirect mapping

Speed

The primary advantage of using AI for redirect mapping is the sheer speed at which it can be done. An initial map of 10,000 URLs could be produced within a few minutes and human-reviewed within a few hours. Doing this process manually for a single person would usually be days of work.

Scalability

Using AI to help map redirects is a method you can use on a site with 100 URLs or over 1,000,000. Large sites also tend to be more programmatic or templated, making similarity matching more accurate with these tools.

Efficiency

For larger sites, a multi-person job can easily be handled by a single person with the correct knowledge, freeing up colleagues to assist with other parts of the migration.

Accuracy

While the automated method will get some redirects “wrong,” in my experience, the overall accuracy of redirects has been higher, as the output can specify the similarity of the match, giving manual reviewers a guide on where their attention is most needed

Disadvantages of using AI for redirect mapping

Over-reliance

Using automation tools can make people complacent and over-reliant on the output. With such an important task, a human review is always required.

Training

The script is pre-written and the process is straightforward. However, it will be new to many people and environments such as Google Colab can be intimidating.

Output variance 

While the output is deterministic, the models will perform better on certain sites than others. Sometimes, the output can contain “silly” errors, which are obvious for a human to spot but harder for a machine.

A step-by-step guide for URL mapping with AI

By the end of this process, we are aiming to produce a spreadsheet that lists “from” and “to” URLs by mapping the origin URLs on our live website to the destination URLs on our staging (new) website.

For this example, to keep things simple, we will just be mapping our HTML pages, not additional assets such as CSS or images, although this is also possible.

Tools we’ll be using

  • Screaming Frog Website Crawler: A powerful and flexible website crawler, Screaming Frog is how we collect the URLs and associated metadata we need for the matching.
  • Google Colab: A free cloud service that uses a Jupyter notebook environment, allowing you to run a range of languages directly from your browser without having to install anything locally. Google Colab is how we are going to run our Python scripts to perform the URL matching.
  • Automated Redirect Matchmaker for Site Migrations: The Python script by Daniel Emery that we’ll be running in Colab.

Step 1: Crawl your live website with Screaming Frog

You’ll need to perform a standard crawl on your website. Depending on how your website is built, this may or may not require a JavaScript crawl. The goal is to produce a list of as many accessible pages on your site as possible.

Crawl your live website with Screaming FrogCrawl your live website with Screaming Frog

Step 2: Export HTML pages with 200 Status Code

Once the crawl has been completed, we want to export all of the found HTML URLs with a 200 Status Code.

Firstly, in the top left-hand corner, we need to select “HTML” from the drop-down menu.

Screaming Frog - Highlighted- HTML filterScreaming Frog - Highlighted- HTML filter

Next, click the sliders filter icon in the top right and create a filter for Status Codes containing 200.

Highlighted: Custom filter optionsHighlighted: Custom filter options

Finally, click on Export to save this data as a CSV.

Highlighted: Export buttonHighlighted: Export button

This will provide you with a list of our current live URLs and all of the default metadata Screaming Frog collects about them, such as Titles and Header Tags. Save this file as origin.csv.

Important note: Your full migration plan needs to account for things such as existing 301 redirects and URLs that may get traffic on your site that are not accessible from an initial crawl. This guide is intended only to demonstrate part of this URL mapping process, it is not an exhaustive guide.

Step 3: Repeat steps 1 and 2 for your staging website

We now need to gather the same data from our staging website, so we have something to compare to.

Depending on how your staging site is secured, you may need to use features such as Screaming Frog’s forms authentication if password protected.

Once the crawl has completed, you should export the data and save this file as destination.csv.

Optional: Find and replace your staging site domain or subdomain to match your live site

It’s likely your staging website is either on a different subdomain, TLD or even domain that won’t match our actual destination URL. For this reason, I will use a Find and Replace function on my destination.csv to change the path to match the final live site subdomain, domain or TLD.

For example:

  • My live website is https://withcandour.co.uk/ (origin.csv)
  • My staging website is https://testing.withcandour.dev/ (destination.csv)
  • The site is staying on the same domain; it’s just a redesign with different URLs, so I would open destination.csv and find any instance of https://testing.withcandour.dev and replace it with https://withcandour.co.uk.
Find and Replace in ExcelFind and Replace in Excel

This also means when the redirect map is produced, the output is correct and only the final redirect logic needs to be written.

Step 4: Run the Google Colab Python script

When you navigate to the script in your browser, you will see it is broken up into several code blocks and hovering over each one will give you a”play” icon. This is if you wish to execute one block of code at a time.

However, the script will work perfectly just executing all of the code blocks, which you can do by going to the Runtime’menu and selecting Run all.

Google Colab RuntimeGoogle Colab Runtime

There are no prerequisites to run the script; it will create a cloud environment and on the first execution in your instance, it will take around one minute to install the required modules.

Each code block will have a small green tick next to it once it is complete, but the third code block will require your input to continue and it’s easy to miss as you’ll likely need to scroll down to see the prompt.

Get the daily newsletter search marketers rely on.


See terms.


Step 5: Upload origin.csv and destination.csv

Highlighted: File upload promptHighlighted: File upload prompt

When prompted, click Choose files and navigate to where you saved your origin.csv file. Once you have selected this file, it will upload and you will be prompted to do the same for your destination.csv.

Step 6: Select fields to use for similarity matching

What makes this script particularly powerful is the ability to use multiple sets of metadata for your comparison.

This means if you’re in a situation where you’re moving architecture where your URL Address is not comparable, you can run the similarity algorithm on other factors under your control, such as Page Titles or Headings.

Have a look at both sites and try and judge what you think are elements that remain fairly consistent between them. Generally, I would advise to start simple and add more fields if you are not getting the results you want.

In my example, we have kept a similar URL naming convention, although not identical and our page titles remain consistent as we are copying the content over.

Select the elements you to use and click the Let’s Go!

Similarity matching fieldsSimilarity matching fields

Step 7: Watch the magic

The script’s main components are all-MiniLM-L6-v2 and FAISS, but what are they and what are they doing?

all-MiniLM-L6-v2 is a small and efficient model within the Microsoft series of MiniLM models which are designed for natural language processing tasks (NLP). MiniLM is going to convert our text data we’ve given it into numerical vectors that capture their meaning.

These vectors then enable the similarity search, performed by Facebook AI Similarity Search (FAISS), a library developed by Facebook AI Research for efficient similarity search and clustering of dense vectors. This will quickly find our most similar content pairs across the dataset.

Step 7: Download output.csv and sort by similarity_score

The output.csv should automatically download from your browser. If you open it, you should have three columns: origin_url, matched_url and similarity_score.

Output csv exampleOutput csv example

In your favorite spreadsheet software, I would recommend sorting by similarity_score

Excel Sort by similarity scoreExcel Sort by similarity score

The similarity score gives you an idea of how good the match is. A similarity score of 1 suggests an exact match.

By checking my output file, I immediately saw that approximately 95% of my URLs have a similarity score of more than 0.98, so there is a good chance I’ve saved myself a lot of time.

Step 8: Human-validate your results

Pay special attention to the lowest similarity scores on your sheet; this is likely where no good matches can be found.

Output.csv: Lower-scored similaritiesOutput.csv: Lower-scored similarities

In my example, there were some poor matches on the team page, which led me to discover not all of the team profiles had yet been created on the staging site – a really helpful find.

The script has also quite helpfully given us redirect recommendations for old blog content we decided to axe and not include on the new website, but now we have a suggested redirect should we want to pass the traffic to something related – that’s ultimately your call.

Step 9: Tweak and repeat

If you didn’t get the desired results, I would double-check that the fields you use for matching are staying as consistent as possible between sites. If not, try a different field or group of fields and rerun.

More AI to come

In general, I have been slow to adopt any AI (especially generative AI) into the redirect mapping process, as the cost of mistakes can be high, and AI errors can sometimes be tricky to spot.

However, from my testing, I’ve found these specific AI models to be robust for this particular task and it has fundamentally changed how I approach site migrations. 

Human checking and oversight are still required, but the amount of time saved with the bulk of the work means you can do a more thorough and thoughtful human intervention and finish the task many hours ahead of where you would usually be.

In the not-too-distant future, I expect we’ll see more specific models that will allow us to take additional steps, including improving the speed and efficiency of the next step, the redirect logic.


Contributing authors are invited to create content for Search Engine Land and are chosen for their expertise and contribution to the search community. Our contributors work under the oversight of the editorial staff and contributions are checked for quality and relevance to our readers. The opinions they express are their own.


Popular Articles