Advanced text editing with Regular Expressions (Regex)

17/11/2024

Understand how to use regular expressions (regex) to efficiently split texts. Id: 20

Capa do artigo Advanced text editing with Regular Expressions (Regex)

Explanation - English

In this tutorial, we will learn how to split texts based on specific patterns using regular expressions in JavaScript. This allows us to perform a range of string manipulation operations, such as:

  • Split by the pattern: Splits the text whenever the pattern appears anywhere in the line.
  • Split immediately before the pattern: Splits the text immediately before the pattern.
  • Create blocks before and after the pattern: Creates blocks of text before and after the pattern without removing it.
  • Remove the pattern: Splits the text using the pattern and removes it from the blocks created.

Examples of Regular Expressions

Example 1

Text: Alice: 21 years Bob: 35 years Charlie: 42 years, Daiane: 25 years

Regex: \d{2}

Break Type: Splits by the pattern if it appears anywhere in the line

Result:
Alice:
years
Bob:
years
Charlie:
years, Daiane:
years
Explanation: The regular expression \"\d{2}\" is used to find two-digit numbers anywhere in the text. The pattern \"\d\" matches any digit from 0 to 9, and \"{2}\" means we are looking for exactly two consecutive digits. Therefore, it finds all the ages in the two-digit format in the lines of text.
Example 2

Text: Emails: - alice@example.com - bob@example.com - charlie123@example.com or charlie4@exemplo.br

Regex: [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}

Break Type: Splits immediately before the pattern

Result:
Emails:
-
a
l
i
c
e@example.com
-
b
o
b@example.com
-
c
h
a
r
l
i
e
1
2
3@example.com or
c
h
a
r
l
i
e
4@exemplo.br
Explanation: The regular expression \"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\" is used to find email addresses. It is composed of several parts: - \"[a-zA-Z0-9._%+-]+\" matches the local part (before the @), which can contain letters, numbers, and special characters. - \"@\" is the symbol separating the local part from the domain. - \"[a-zA-Z0-9.-]+\" matches the domain name. - \"\.[a-zA-Z]{2,}\" finds the top-level domain part (like .com, .org).
Example 3

Text: Event dates: 2022-03-25 bla bla 2022-10-02 bla bla bla 2023-06-12 2024-08-05

Regex: \d{4}-\d{2}-\d{2}

Break Type: Creates blocks before and after the pattern (without removing the pattern)

Result:
Event dates:
2022-03-25
bla bla
2022-10-02
bla bla bla
2023-06-12
2024-08-05
Explanation: The regular expression \"\d{4}-\d{2}-\d{2}\" is designed to find dates in the format YYYY-MM-DD. It consists of: - \"\d{4}\" matches a 4-digit year. - \"-\d{2}-\d{2}\" matches the month and day, both with two digits. The expression finds all dates in the expected format within the text, without removing them.
Example 4

Text: Price list: Bla bla Product A: $50.00 bla bla bla bla Product B: $35.99 Product C: $12.75 bla bla Product D:$54,00

Regex: \$\d+\.\d{2}

Break Type: Uses the pattern as a divisor and removes it from the blocks

Result:
Price list:
Bla bla Product A: bla bla bla bla
Product B:
Product C: bla bla Product D:$54,00
Explanation: The regular expression \"\$\d+\.\d{2}\" finds dollar prices in the format $XX.XX. It uses: - \"\$\\" to match the dollar symbol ($). - \"\d+\" finds one or more digits before the decimal point. - \"\.\d{2}\" finds a decimal point followed by exactly two digits.
Example 5

Text: Codes: AB-123-CD EF-456-GH bla bla EF-456-AH bla 1F-456-GH IJ-789-KL

Regex: [A-Z]{2}-\d{3}-[A-Z]{2}

Break Type: Splits by the pattern if it appears anywhere in the line

Result:
Codes:
bla bla
bla 1F-456-GH
Explanation: The regular expression \"[A-Z]{2}-\d{3}-[A-Z]{2}\" finds codes in the format AB-123-CD. It works as follows: - \"[A-Z]{2}\" matches two uppercase letters. - \"-\d{3}-\" matches a hyphen, followed by three digits and another hyphen. - \"[A-Z]{2}\" matches two uppercase letters after the final hyphen.
Example 6

Text: URLs: Bla bla https://site.com Bla http://example.com bla https://www.openai.com

Regex: https?://[\w.-]+

Break Type: Creates blocks before and after the pattern (without removing the pattern)

Result:
URLs:
Bla bla
https://site.com
Bla
http://example.com
bla
https://www.openai.com
Explanation: The regular expression \"https?://[\w.-]+\" finds URLs that start with http or https. - \"https?\": the \"s?\" allows for an optional "s" after http. - \"://[\w.-]+\": matches the domain, which can contain letters, numbers, underscores, dots, and hyphens.

Para comentários:

Se quiser comentar, sugerir (acréscimos, retificações etc), criticar, elogiar, informar, sobre algum trecho deste artigo, peço a gentileza de utilizar a área de comentários do abaixo informada, no Youtube.

Já existe uma mensagem por lá dedicada a comentários sobre temas publicados neste portal.

Essa também é uma forma de contribuir com o trabalho e estimular sua continuidade e aprimoramento.

Peço a gentileza de comentar, curtir e compartilhar o conteúdo, além de se inscrever no canal do Youtube e ativar o sino de notificações para receber notícias de novos conteúdos.

Agradeço desde já!

Destinado para esses comentários em geral:

https://www.youtube.com/@roberto_csantos/community