Jump to content

Wikidata:Report a technical problem/WDQS and Search

Add topic
From Wikidata

Report a problemHow to report a problemHelp with PhabricatorGet involvedWDQS and Search

Start a new discussion

item returns 2 different schema:dateModified values

[edit]

Moved here from Wikidata:Report a technical problem Lucas Werkmeister (WMDE) (talk) 12:18, 6 June 2025 (UTC)Reply

For Royal Navy vessels, I observe this problem 3 times out of 7500 items.

SELECT DISTINCT ?item ?itemDescription ?modified WHERE {
  VALUES ?item {wd:Q1297941}
  SERVICE wikibase:label {bd:serviceParam wikibase:language "en-gb,mul,en"}
  ?item schema:dateModified ?modified .
  FILTER(BOUND(?modified) && DATATYPE(?modified) = xsd:dateTime).
}
GROUP BY ?item ?itemDescription ?modified
Try it!

returns 21 February 2024 and 7 March 2025. Only the latter is correct. How can it be storing 2 values? Vicarage (talk) 11:04, 6 June 2025 (UTC)Reply

Now only returning one. Another example of the rogue query server? Vicarage (talk) 06:27, 9 June 2025 (UTC)Reply
Tried it again, got 2 responses again, is the rogue still there? Vicarage (talk) 16:25, 11 June 2025 (UTC)Reply
@Vicarage thanks for reporting this, I confirm that there are still two WDQS servers misbehaving here (wdqs1017 & wdqs1021), I believe this might be related to phab:T386098 and looking at the progress of the data-transfer I can see that these servers are not yet reloaded. I'll raise this problem on the ticket.
As to why it happens, if the WDQS machines de-synchronize itself from the update stream this kind of inconsistencies may appear in WDQS. DCausse (WMF) (talk) 10:31, 14 August 2025 (UTC)Reply
Should be all resolved. If you have trouble getting a response form us in the future (sometimes we forget to check this page as frequently as would be ideal) feel free to drop into #wikimedia-search on libera chat IRC and raise the inquiry there. RKemper (WMF) (talk) 22:06, 26 August 2025 (UTC)Reply
This is Brian, I'm part of the SRE team that supports WDQS. Sorry for the delay on this! We are working through the hosts on https://etherpad.wikimedia.org/p/wdqs-reload-T386098 . We should be done within the next couple of days. If you continue to see issues, feel free to ping us directly in the linked Phab task. BKing (WMF) (talk) 21:28, 20 August 2025 (UTC)Reply

upstream request timeout

[edit]

I'm getting this response and a 60 second timeout from https://query.wikidata.org/sparql for what seem quite innocuous queries. The message has only started appearing in the last couple of days, and replaces the full SPARQL traceback I used to get. Re-runs a few minutes later often work with response times < 10 seconds. Vicarage (talk) 16:24, 11 June 2025 (UTC)Reply

And what times out from the command line completes in 6 seconds from the website Vicarage (talk) 20:42, 11 June 2025 (UTC)Reply
@Vicarage do you still see the issue, if yes could you elaborate a bit more on what you are trying to achieve? What is the SPARQL query? What do you mean by command line, are you using curl, if yes could you paste the full command you are running? DCausse (WMF) (talk) 15:13, 13 August 2025 (UTC)Reply
From the Web page
SELECT DISTINCT
?item
?itemDescription
WHERE {
SERVICE wikibase:label {bd:serviceParam wikibase:language "en-gb,mul,en"}
VALUES ?item {wd:Q182027}
}
times out today (or takes 28 seconds!!!), after I cut all the real content from a query Vicarage (talk) 09:24, 15 August 2025 (UTC)Reply
@Vicarage thanks for the information.
I can't seem to be able to reproduce the issue, I ran some load testing using this query on both datacenters and this query always seems to return results in less than 1 second. By chance, did you hear any other users facing similar problems with this particular query?
What could explain the problem is that some wdqs servers might enter a dead-lock and could cause some queries to fail but usually the problematic servers are automatically removed so that they no longer serve queries until they're restarted and put back in rotation. DCausse (WMF) (talk) 08:19, 18 August 2025 (UTC)Reply
Sorry, I don't talk to others about their MYSQL problems really. Another problem that may be due to inconsistent servers is this query, which should always return 1 result, returns randomly 1 2 or 3 results each time I hit submit yesterday and today from the web page, as if I'm getting a different server round robin.
{{SPARQL|query=
SELECT DISTINCT
?item
(COALESCE(?label1,SAMPLE(?label2),'Unknown') AS ?title)
?itemDescription
(GROUP_CONCAT(DISTINCT ?alias; separator="#") AS ?aliases)
(SAMPLE(?country1) AS ?country)
(MAX(?modified1) AS ?modified)
(MIN(?start1) AS ?start)
(MAX(?end1) AS ?end)
(GROUP_CONCAT(DISTINCT ?conn; SEPARATOR='#') AS ?connlist)
(GROUP_CONCAT(DISTINCT ?hconn; SEPARATOR='#') AS ?hconnlist)
(GROUP_CONCAT(DISTINCT ?tag; SEPARATOR='#') AS ?taglist)
(GROUP_CONCAT(DISTINCT ?note; SEPARATOR=', ') AS ?notelist)
(GROUP_CONCAT(DISTINCT ?lnote; SEPARATOR=', ') AS ?lnotelist)
(SAMPLE (COALESCE(?position1,?position2,?position3)) AS ?position)
(SAMPLE (STR(?bestimage)) AS ?image)
WHERE {
SERVICE wikibase:label {bd:serviceParam wikibase:language "en-gb,mul,en"}
VALUES ?item {wd:Q150609}
VALUES ?details {
wdt:P279 # subclass
}
VALUES ?hdetails {wdt:P137 wdt:P176} # operator, manufacturer
VALUES ?idetails {wdt:P7906}
VALUES ?tags {wdt:P31} # instance
VALUES ?starts {wdt:P580 wdt:P571 wdt:P729} # start time, inception, service entry
VALUES ?ends {wdt:P582 wdt:P730} # end time, service retirement
VALUES ?countries {wdt:P495} # origin
VALUES ?images {wdt:P7906}
VALUES ?registers {wd:Q22964288} # military
OPTIONAL {
SERVICE wikibase:label {bd:serviceParam wikibase:language "en-gb,mul,en".
?item rdfs:label ?label1}. FILTER (!REGEX(?label1,"^[Q][0-9]"))
}
OPTIONAL {
?item ?countries ?country.
?country wdt:P37/wdt:P424 ?langcode.
?item rdfs:label ?label2 FILTER (LANG(?label2) = ?langcode)
}
OPTIONAL {?item skos:altLabel ?alias. FILTER(LANG(?alias) = "en")}
OPTIONAL {{?item ?hdetails ?t2} ?t2 rdfs:label ?n2. FILTER (LANG(?n2) = 'en') BIND(CONCAT(?n2,'£',STR(?t2)) AS ?hconn)}
}
GROUP BY ?item ?itemDescription ?label1 ?label2 ?wikipedia ?ia
}} Vicarage (talk) 09:05, 18 August 2025 (UTC)Reply
@Vicarage in this last query, no the problem is not due to inconsistent WDQS servers but due the SPARQL features you are using. Some SPARQL features such as GROUP_CONCAT or SAMPLE are not deterministic, in other words they might return different results even when given the same arguments.
Here what is likely happening is that the de-duplication you are requesting via DISTINCT is not de-duplicating the way you expect. The item Q150609 you are requesting is duplicated certainly because some of your triple patterns but the GROUP_CONCAT on ?hconn may concatenate its values using a different order for the two, which in turn will tell DISTINCT to keep both lines (you can see that the two lists have a slightly different ordering when two results are returned).
One way to solve this would be to understand why you needed to add DISTINCT in the first place and fix your triple patterns so that you no longer need it. DCausse (WMF) (talk) 15:44, 19 August 2025 (UTC)Reply
I don't think so. I understand that SPARQL does not guarantee an order for GROUP_CONCAT (and my application manually reorders after extraction), but I think I can expect that for each ?item I get a list of other variables which are either sampled or concatenated. I think my problem might be with ?label2 being in the GROUP BY even though it is SAMPLEd. I will investigate, though I'd expect Blazegraph to catch that. Vicarage (talk) 16:03, 19 August 2025 (UTC)Reply

Lier un lexème à sa page sur le Wiktionnaire

[edit]

Hello, Je constate qu'il n'est pas possible de lier un lexème (LID) (ex: akanza) à sa page wiktionnaire (ex: akanza) comme cela se fait déjà pour un élément (QID). Poro26 (talk) 15:36, 22 August 2025 (UTC)Reply

(apologies for writing in English)
Hi @Poro26. It’s not possible to directly link Lexemes (LIDs) to Wiktionary pages the same way Items (QIDs) can be linked. See Wikidata:Wiktionary. The main reason is that Wiktionary pages often cover more than one word, so there isn’t always a clear one-to-one correspondence between a Wiktionary page and a single Wikidata Lexeme.
PS: For future reference, enquiries like this are best placed at Wikidata:Report a technical problem, since this page is specifically for issues with the Query Service and search features. -Mohammed Abdulai (WMDE) (talk) 08:38, 25 August 2025 (UTC)Reply
Ok Poro26 (talk) 18:24, 25 August 2025 (UTC)Reply
Bonjour @Poro26, je viens de déplacer ta question vers Wikidata:Bistro#Lier_un_lexème_à_sa_page_sur_le_Wiktionnaire car je pense que cette page est plus appropriée pour cette question. DCausse (WMF) (talk) 08:45, 25 August 2025 (UTC)Reply
Ok Poro26 (talk) 18:24, 25 August 2025 (UTC)Reply

web endpoint truncating

[edit]

Queries that return lots of rows are truncating halfway through lines, at 327680 or 1683840 characters, suspicious numbers. The same queries produce extra results through the GUI

SELECT DISTINCT ?item) ?modified
WHERE {

{?item wdt:P31/wdt:P279* wd:Q35509}  # cave

{
  ?wikipedia schema:about ?item.
  FILTER regex(str(?wikipedia), 'wikipedia.org')
}

?item schema:dateModified ?modified .
}
Try it!

using

curl -s -H "Accept: text/csv" -H "User-Agent: expounder \\ (https://expounder.info)" 'https://query.wikidata.org/sparql' --data-urlencode query="$(cat list.query)" > list.wd
 
 tail -2 /home/john/Expounder/Underfoot/Sites/select/caves/list.wd
http://www.wikidata.org/entity/Q27309689,2024-11-21T17:39:02Z 
http://www.w

and it its inconsistent, changing cave to tunnel (wd:Q44377) allows 980000 characters thorough

Vicarage (talk) 14:46, 27 August 2025 (UTC)Reply

Hi,
I was not able to replicate this issue. In all tests I've done, cli and GUI returned consistent result sets.
If the issues persists, could you run curl with higher verbosity (-vv) and share the output? GModena (WMF) (talk) 10:31, 2 September 2025 (UTC)Reply
Still happening over a week. Today, first time it timed out, then gave truncated results after 50 seconds with
% Total  % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 2a02:ec80:300:ed1a::1:443...
  • TCP_NODELAY set
  • Connected to query.wikidata.org (2a02:ec80:300:ed1a::1) port 443 (#0)
  • ALPN, offering h2
  • ALPN, offering http/1.1
  • successfully set certificate verify locations:
  • CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
} [5 bytes data]
  • TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
  • TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
  • TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
{ [19 bytes data]
  • TLSv1.3 (IN), TLS handshake, Certificate (11):
{ [2768 bytes data]
  • TLSv1.3 (IN), TLS handshake, CERT verify (15):
{ [80 bytes data]
  • TLSv1.3 (IN), TLS handshake, Finished (20):
{ [36 bytes data]
  • TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
  • TLSv1.3 (OUT), TLS handshake, Finished (20):
} [36 bytes data]
  • SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
  • ALPN, server accepted to use h2
  • Server certificate:
  • subject: CN=*.wikipedia.org
  • start date: Aug 10 23:56:29 2025 GMT
  • expire date: Nov 8 23:56:28 2025 GMT
  • subjectAltName: host "query.wikidata.org" matched cert's "*.wikidata.org"
  • issuer: C=US; O=Let's Encrypt; CN=E6
  • SSL certificate verify ok.
  • Using HTTP2, server supports multi-use
  • Connection state changed (HTTP/2 confirmed)
  • Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
} [5 bytes data]
  • Using Stream ID: 1 (easy handle 0x559f9c205dc0)
} [5 bytes data]
> POST /sparql HTTP/2
> Host: query.wikidata.org
> accept: text/csv
> user-agent: expounder (https://expounder.info)
> content-length: 1714
> content-type: application/x-www-form-urlencoded
>
} [5 bytes data]
  • We are completely uploaded and fine
{ [5 bytes data]
  • TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [249 bytes data]
  • TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [249 bytes data]
  • old SSL session ID is stale, removing
{ [5 bytes data]
  • Connection state changed (MAX_CONCURRENT_STREAMS == 100)!
} [5 bytes data]
100 1714 0 0 100 1714 0 197 0:00:08 0:00:08 --:--:-- 0< HTTP/2 200
< server: nginx/1.18.0
< date: Tue, 02 Sep 2025 11:07:26 GMT
< content-type: text/csv;charset=utf-8
< content-disposition: attachment; filename=query1455078.csv
< x-first-solution-millis: 7
< x-served-by: wdqs1011
< access-control-allow-origin: *
< access-control-allow-headers: accept, content-type, content-length, user-agent, api-user-agent
< cache-control: public, max-age=300
< age: 4
< vary: Accept, Accept-Encoding
< x-cache: cp3069 miss, cp3069 pass
< x-cache-status: pass
< server-timing: cache;desc="pass", host;desc="cp3069"
< strict-transport-security: max-age=106384710; includeSubDomains; preload
< report-to: { "group": "wm_nel", "max_age": 604800, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3
c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }
< nel: { "report_to": "wm_nel", "max_age": 604800, "failure_fraction": 0.05, "success_fraction": 0.0}
< x-client-ip: 2a0a:ef40:9ec:9d01:6539:d50c:9686:6497
< set-cookie: WMF-Uniq=dT5jT3d61u7QwurXwlpVTQJiAAAAAFvdmI6ubr0VowiIZp7-3bCDysD9G5KdipaX;Domain=.wikidata.org;Path=/;HttpOnly;secure;S
ameSite=None;Expires=Wed, 02 Sep 2026 00:00:00 GMT
<
{ [13833 bytes data]
100 321k 0 320k 100 1714 6516 34 0:00:50 0:00:50 --:--:-- 0
  • Connection #0 to host query.wikidata.org left intact
50.29
5233 results for caves Vicarage (talk) 11:09, 2 September 2025 (UTC)Reply
A couple of years ago I was looking into various failure modes. One of them are when the query completes, but fails during transfer. This can AFAIK only be detected when the JSON doesn't validate. If you're requesting data in CSV there may be no way to tell. From the above it looks like query started returning results 8 seconds in, then aborts 50 seconds later which fits with the 60 seconds timeout, and the truncated data is as expected. If it takes a minute to transfer 320k worth of data on your internet connection, it sounds like it might be throttled. Infrastruktur (talk) 18:06, 9 September 2025 (UTC)Reply
I'm on full fibre, so its an upstream problem. Bigger, very similar queries work, but I've also seen other queries fail exactly this way Vicarage (talk) 18:47, 9 September 2025 (UTC)Reply
Have you tried the query on QLevers Wikidata endpoint? See https://github.com/dpriskorn/WikidataOrcidScraper/blob/master/models/qlever.py for how to make the request. So9q (talk) 04:14, 10 September 2025 (UTC)Reply
Last time I used qlever using curl a few months back I had assorted problems, ending with the service never responding at all. The level of documentation and support seemed poor. Perhaps when I returnn from holiday. Vicarage (talk) 06:25, 10 September 2025 (UTC)Reply

false positive / db inconsistency

[edit]

Q137041536#P8322 complains about that Eggendorf (Q18755525) violates the uniqueness constraint for property cadastral municipality ID in Austria (P8322). Yesterday I removed the property from the other object [1] and even after purging both objects, the constraint violation persists. Even more, using sparql to find objects with both Ids https://w.wiki/GLRg, Eggendorf (Q18755525) is still listed although the user interface does not show the property cadastral municipality ID in Austria (P8322) any more. best --Herzi Pinki (talk) 19:17, 27 November 2025 (UTC)Reply

(Moving from the parent page to /WDQS and Search because it sounds like a query service issue – the constraint report also gets the information for this constraint from the query service.) --Lucas Werkmeister (WMDE) (talk) 10:43, 28 November 2025 (UTC)Reply