Simple Web Scraping Framework, based on Curl and str* functions. Requires PHP ^8.0 (can easily be downgraded to PHP 7)
There is some functions to simplify your script, they are listed below:
- upname => Format and return string to name formats (first each word character is uppercase)
- price => Return float value of and price string of type 'xx$: 9.999,99'
- accents => Replace accentuation with equivalent characters
- strmstr => Return string after start3, after start2, after start1
- strpart => Return middle string between start and end strings
- strmpart => Return middle string between start2 and end string, which start2 is after start1
- Scrapping::cache => can save or return an cache
- Scrapping::cacheFolder => you can set a custom folder to cache
- Scrapping::json => parse and print the response in json
- Scrapping::isOnSession => tell if an session is set with server
- Scrapping::load => Reuse session of previews connections
- Scrapping::useSession => Set/Return if session setting is enabled
- Scrapping::userAgent => Set/Return userAgent
- Scrapping::server => Set/Return server host base URL
- Scrapping::session => Set/Return server session id
- Scrapping::hasSession => Returns if has a server session id set
- Scrapping::sesionName => Set/Retrun server session cookie name
- Scrapping::get => Make and return an get request to the server
- Scrapping::post => Make and return an post request to the server
- Scrapping::proccess => Proccess the get and post request to organaze data
upname(string $text);Just pass the text string to format as parameter and the result will be the formated string. Some exemples bellow, the comment of each block represents the output:
echo upname('lara vieira');
// 'Lara Vieira'echo upname('LARA VIEIRA');
// 'Lara Vieira'echo upname('LEONARDO DE CÁPRIO');
// 'Leonardo de Cáprio'echo upname('DON PEDRO II');
// 'Don Pedro II'price(string $text);Just pass the text string to format as parameter and the result will be the float value. Some exemples bellow, the comment of each block represents the output:
echo price('US$: 3.567,56');
// 3567.56echo price('R$: 3.456.234,45');
// 3456234.45echo price('Price is R$: 234,45');
// 234.45accents(string $text);Just pass the text to format as parameter and the result will be the formated string. An exemple bellow, the comment represent the output:
echo accents('Aglomeração, Apóstolo, vô, vó');
// 'Aglomeracao, Apostolo, vo, vo'strmstr(
string $haystack,
string $start1,
string $start2,
string|null $start3=null
);This function return all haystack string after start3 string that is after start2 string that is after start1 string (if start3 is passed) or all haystack string after start2 string that is after start1 string. The return will include the last start passed, like strstr.
This function is something like an stack of strstr functions:
strstr(strstr(strstr(haystack, start1), start2), start3)Some exemples bellow, the comment of each block represents the output:
echo strmstr('ABC ABC ABC', 'C');
// 'C ABC ABC'echo strmstr('ABC ABC ABC', 'B', 'A');
// 'ABC ABC'echo strmstr('ABC ABC ABC', 'B', 'B', 'A');
// 'ABC'* This is my favorite one for web-scrapping.
strpart(
string $haystack,
string|null $start = null,
string|null $end = null,
bool $keep_start = false
);This function will return the middle string in haystack between the first occurence of start string and the first occurence of end string after start string.
-
If
startstring is null, will return everything inhaystackbefore the first occurence ofendstring. -
If
endstring is null, will return everything inhaystackafter the first occurence ofstartstring. -
If
keep_startboolean is set totrue, default isfalse, the function will return as normal, but includingstartstring in the retrun's begin.
Some exemples bellow, the comment of each block represents the output:
echo strpart('ABC ABC ABC', ' ', ' ');
// 'ABC'echo strpart('ABC ABC ABC', ' ');
// 'ABC ABC'echo strpart('ABC ABC ABC', end:' ');
// 'ABC'echo strpart('ABC ABC ABC', ' ', ' ', true);
// ' ABC'echo strpart('<h2>Subtitle<h2>', '>', '<');
// 'Subtitle'echo strpart('<div><div>Content</div></div>', '<div>', '</div>');
// '<div>Content'strmpart(
string $haystack,
string $start1,
string $start2,
string|null $end = null,
bool $keep_start = false
);This function solve the last example of strpart.
This function will return the middle string in haystack between the first occurence of start2 string, that one is after the first occurence of start1 string, and the first occurence of end string after start2 string.
-
If
endstring is null, will return everything inhaystackafter the first occurence ofstart2string, after the first occurence ofstart1string. -
If
keep_startboolean is set totrue, default isfalse, the function will return as normal, but includingstart2string in the retrun's begin.
Some exemples bellow, the comment of each block represents the output:
echo strmpart('<div><div>Content</div></div>', '<div>', '<div>', '</div>');
// 'Content'echo strmpart('<div><div>Content</div></div>', '>', '>', '<');
// 'Content'echo strmpart('<a id="link1"><h2 id="text1">Content</h2></a>', '<h2', 'id="', '"');
// 'text1'